{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Copyright (c) 2024 Massachusetts Institute of Technology\n", "\n", "SPDX-License-Identifier: MIT" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Launching a Set of Simulations\n", "\n", "So far, we've explored how to use `madlib` to launch a single satellite/sensor simulation at a time. The true power of `MaDDG`, however, is that it allows us to conduct large-scale experiments with randomly varied parameters.\n", "\n", "This notebook will walk through the following steps to launching a custom batch of simulation experiments:\n", "1) Defining the experiment parameters,\n", "2) Preparing the simulator task function, and \n", "3) Launching the experiments in parallel.\n", "4) Analyzing the results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Defining the Experiment Parameters\n", "\n", "Each simulation involves a network of sensors observing a satellite, and that satellite may or may not be performing a maneuver during the period of observation. The first step to creating our set of simulations is to define these parameters, or at least the distributions from which the parameters will be pulled." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating the Sensor Network\n", "\n", "One thing that every simulation in our set will have in common is that they will all use the same sensor network. `MaDDG` comes packaged with a sample network (`configs/sample_sensor_network.yaml`) that offers good global coverage of the GEO belt. Let's go ahead and use that.\n", "\n", "All we need to do is point our code to the YAML file containing the sensor parameters. To better understand these parameters, refer to the Example 3 notebook on sensor collections." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "sensor_yaml = Path(\"../configs/sample_sensor_network.yaml\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating the Satellite Distributions\n", "\n", "While we will create the `Satellite` objects themselves (including their orbital parameters) within the simulator task function, we can define the types and vector distributions of their maneuvers now. That way, each separate experiment can create a unique satellite, but they will all have the same general class of behavior.\n", "\n", "For this example, we'll consider impulsive maneuvers with in-track and cross-track components." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# First, define the maneuver type as a string (\"impulse\", \"continuous\", or \"all\")\n", "mtype = \"impulse\"\n", "\n", "# The vector components of the thrust vector will be pulled from normal distributions, the mean and width of which we define here.\n", "# The values are entered as a 3-element list with the radial, in-track, and cross-track components of the delta-v (respectively)\n", "# in km/s.\n", "dv_ric_mean_kms = (0.0, 0.0, 0.0)\n", "dv_ric_std_kms = (0.0, 0.1, 1.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Defining the Experiment Timing\n", "\n", "We can control the time and day of the simulated epoch at which our experiments will start, as well as the simulated duration of the experiments. For this example, we'll use the arbitrary date of 60197.5 (again, all dates are in MJD format!). For the simulation duration, we'll go with 3 days." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "start_mjd = 60197.5\n", "sim_duration_days = 3.0 # Duration is defined in days" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Defining the Number of Simulations and Launch Controls\n", "\n", "The next thing we need to do is decide how many unique simulations to run. Note that we define this as a number of pairs: each pair contains one maneuvering and one non-maneuvering satellite (with different orbital parameters). This is designed to give us a balanced dataset for classification tasks.\n", "\n", "We'll keep this example lightweight and run 5 pairs of simulations, for a total of 10." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "num_sim_pairs = 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we need to figure out where the outputs of our simulations will go, and if we want to run them in parallel. Since we're running 100 simulations, parallelizing them will be a good idea. Luckily, the [submitit](https://hydra.cc/docs/plugins/submitit_launcher/) plugin for [hydra](https://hydra.cc/) (both of which are installed with this project) makes it easy to distribute jobs across available CPUs.\n", "\n", "Feel free to change the output directory in this example, but the one that we'll create is already added to the project's `.gitignore`, so it shouldn't cause any issues." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Create the output directory\n", "output_path = \"example_outputs\"\n", "output_dir = Path(output_path)\n", "output_dir.mkdir(exist_ok=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The multirun directory is where hydra-zen saves experiment files. We'll put it under `example_outputs/`. We can raise a flag to automatically delete the multirun directory when the jobs are complete, but we'll preserve them this time for illustrative purposes." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "multirun_dir = output_dir / \"multirun\"\n", "multirun_path = str(multirun_dir) # We need the string version of this path\n", "\n", "rm_multirun_root = (\n", " False # If True, multirun directories created by this job will be deleted\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we need a configuration JSON to tell `MaDDG` to use the `submitit` plugin to parallelize the job. We just want to run locally, so we'll set the hydra launcher to `submitit_local`.\n", "\n", "We'll also set the option `hydra.job.chdir` to `True` to make sure the outputs of each simulation are saved to their own respective multirun directories." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "submitit_json = [\n", " \"hydra.job.chdir=True\",\n", " \"hydra/launcher=submitit_local\",\n", " \"hydra.launcher.nodes=1\",\n", " \"hydra.launcher.cpus_per_task=2\",\n", " \"hydra.launcher.tasks_per_node=10\",\n", " \"hydra.launcher.mem_gb=16\",\n", "]\n", "\n", "submitit_file = Path(\"example_outputs\") / \"submitit.json\"\n", "with open(submitit_file, \"w\") as f:\n", " json.dump(submitit_json, f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preparing the Simulator Task Function\n", "\n", "The task function is a concept borrowed from [hydra-zen](https://mit-ll-responsible-ai.github.io/hydra-zen/index.html), and it's the actual code that will be run for each simulation. Our task function should instantiate the sensor network and random satellite, propagate the orbit and record observations throughout the simulation duration, compute metrics on the produced observations, and return those metrics as a [pandas](https://pandas.pydata.org/) DataFrame. `MaDDG` will automatically collate all of the DataFrames created by the experiments and produce a single CSV as our final output.\n", "\n", "Our task function must accept the following input parameters:\n", "- `seq_id` - An integer ID for the experiment\n", "- `sensor_params` - A dictionary containing the parameters for each sensor in our network\n", "- `maneuver_type` - An integer defining the type of maneuver for the satellite (0=None, 1=Impulsive, 2=Continuous)\n", "- `sim_duration_days` - A float defining the length of the simulation in days\n", "- `start_mjd` - A float (or None, as discussed above) defining the simulation's starting epoch in MJD format\n", "- `dv_ric_mean_kms` - A tuple of 3 floats defining the means of the thrust vector distributions (see \"Creating the Satellite Distributions\" above)\n", "- `dv_ric_std_kms` - Same as `dv_ric_mean_kms`, but for the distribution widths\n", "\n", "Our simulation task is to calculate **residuals**, the differences between the satellite's expected position and its actual position at each observation time." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "from typing import Tuple" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "import madlib\n", "from maddg._residuals import calculate_residuals\n", "import numpy as np\n", "\n", "\n", "def simulator_task(\n", " seq_id: int,\n", " sensor_params: dict,\n", " maneuver_type: int,\n", " sim_duration_days: float,\n", " start_mjd: float,\n", " dv_ric_mean_kms: Tuple[float, float, float],\n", " dv_ric_std_kms=Tuple[float, float, float],\n", " **kwargs,\n", "):\n", " # Define a SensorCollection object from the given parameters\n", " sensors = [madlib.GroundOpticalSensor(**params) for key, params in sensor_params.items()]\n", " sensor_network = madlib.SensorCollection(sensors)\n", "\n", " # Timing\n", " epoch = start_mjd\n", "\n", " # Create the satellite (a GEO object at a random longitude)\n", " sat_longitude = 360 * np.random.random()\n", " sat_observed = madlib.Satellite.from_GEO_longitude(sat_longitude, epoch)\n", "\n", " maneuver = None\n", " maneuver_mjd = None\n", " maneuver_r_kms = None\n", " maneuver_i_kms = None\n", " maneuver_c_kms = None\n", "\n", " # For maneuvering cases, create a random maneuver vector\n", " if maneuver_type == 1:\n", " # Pick a random maneuver time during the simulation\n", " maneuver_mjd = epoch + sim_duration_days * np.random.random()\n", "\n", " # Calculate the thrust vector using the input distributions\n", " mean_rad, mean_in, mean_crs = dv_ric_mean_kms\n", " std_rad, std_in, std_crs = dv_ric_std_kms\n", "\n", " maneuver_r_kms = mean_rad + std_rad * np.random.randn()\n", " maneuver_i_kms = mean_in + std_in * np.random.randn()\n", " maneuver_c_kms = mean_crs + std_crs * np.random.randn()\n", "\n", " # Define the ImpulsiveManeuver object, converting from km/s to m/s\n", " man_dv = np.array([maneuver_r_kms, maneuver_i_kms, maneuver_c_kms]) / 1000\n", " maneuver = madlib.ImpulsiveManeuver(maneuver_mjd, man_dv)\n", "\n", " sat_observed.maneuver = maneuver\n", "\n", " # Observe and calculate residuals\n", " residual_df = calculate_residuals(\n", " sensors=sensor_network,\n", " satellite=sat_observed,\n", " sim_duration_days=sim_duration_days,\n", " t_start_mjd=epoch,\n", " )\n", "\n", " # Append maneuver information to the output dataframe\n", " if residual_df is not None:\n", " residual_df[\"Maneuver\"] = maneuver_type\n", " residual_df[\"Sequence\"] = int(seq_id)\n", " residual_df[\"Maneuver_MJD\"] = maneuver_mjd\n", " residual_df[\"Maneuver_DV_Radial_KmS\"] = maneuver_r_kms\n", " residual_df[\"Maneuver_DV_InTrack_KmS\"] = maneuver_i_kms\n", " residual_df[\"Maneuver_DV_CrossTrack_KmS\"] = maneuver_c_kms\n", "\n", " # Return the requisite dataframe\n", " return residual_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Launching the Experiments\n", "Next we use the `MaDDG` launcher to run the simulations we've defined in the manner we've configured. There are several inputs to this function, many of which are optional or exclusive to continuous maneuver cases. We encourage you to read the function's documentation for a full explanation of the arguments.\n", "\n", "For now, we introduce no parameters that were not discussed above. Executing the cell below will launch the simulations, which may take several minutes to complete depending on your computer's resources." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INFO :: mtype = 'impulse'\n", "INFO :: sim_duration_days = 3.0\n", "INFO :: sims_per_task = 1\n", "[2024-06-18 16:58:24,003][HYDRA] Submitit 'local' sweep output dir : example_outputs/multirun/2024-06-18/16-58-22\n", "[2024-06-18 16:58:24,005][HYDRA] \t#0 : maneuver_type=0 seq_id=0\n", "[2024-06-18 16:58:24,014][HYDRA] \t#1 : maneuver_type=0 seq_id=1\n", "[2024-06-18 16:58:24,023][HYDRA] \t#2 : maneuver_type=0 seq_id=2\n", "[2024-06-18 16:58:24,034][HYDRA] \t#3 : maneuver_type=0 seq_id=3\n", "[2024-06-18 16:58:24,044][HYDRA] \t#4 : maneuver_type=0 seq_id=4\n", "[2024-06-18 16:58:24,053][HYDRA] \t#5 : maneuver_type=1 seq_id=0\n", "[2024-06-18 16:58:24,062][HYDRA] \t#6 : maneuver_type=1 seq_id=1\n", "[2024-06-18 16:58:24,071][HYDRA] \t#7 : maneuver_type=1 seq_id=2\n", "[2024-06-18 16:58:24,080][HYDRA] \t#8 : maneuver_type=1 seq_id=3\n", "[2024-06-18 16:58:24,090][HYDRA] \t#9 : maneuver_type=1 seq_id=4\n" ] } ], "source": [ "from maddg._sim_launcher import launcher\n", "\n", "launcher(\n", " simulator_method=simulator_task,\n", " mtype=mtype,\n", " num_sim_pairs=num_sim_pairs,\n", " sensor_yaml=sensor_yaml,\n", " outdir=output_dir,\n", " dv_ric_mean_kms=dv_ric_mean_kms,\n", " dv_ric_std_kms=dv_ric_std_kms,\n", " submitit=str(submitit_file),\n", " multirun_root=multirun_path,\n", " rm_multirun_root=rm_multirun_root,\n", " start_mjd=start_mjd,\n", " sim_duration_days=sim_duration_days,\n", " random_seed=0,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explaining the Outputs\n", "Once the jobs have completed, let's stop and take a look at the files that have been produced in our output directory:\n", "\n", "- **multirun** directory\n", " - This will have nested folders named with the date and time that the simulations were launched.\n", " - Inside the date/time folders will be 10 folders labeled 0-9 and a **.submitit** directory.\n", " - The **.submitit** folder contains the stderr and stdout of the individual jobs. This is useful for debugging.\n", " - Each numbered folder contains a **.hydra** folder\n", " - Inside, you'll find various YAML files describing the configuration of that specific experiment.\n", " - The numbered folders also contain **zen_launch.log** (a log of the job submissions that can be ignored) and **output.csv**, which holds the measurements from that simulation.\n", "- **complete.csv**\n", " - This is the final data product, concatenating the results from the individual simulations.\n", "- **errors.txt**\n", " - If any simulations encountered errors, they will be recorded here.\n", "- **multirun.yaml**\n", " - Another hydra YAML summarizing the jobs we launched. You can ignore this, but it's a useful summary and can be used to reproduce the experiment in the future.\n", "- **submitit.json**\n", " - The JSON we created earlier in this notebook to configure the submitit parallellization of our jobs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analyzing the Experiments\n", "Let's take a look at the collated results of our simulations by reading the **complete.csv** file:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | MJD | \n", "RA Arcsec | \n", "DEC Arcsec | \n", "SensorID | \n", "Maneuver | \n", "Sequence | \n", "Maneuver_MJD | \n", "Maneuver_DV_Radial_KmS | \n", "Maneuver_DV_InTrack_KmS | \n", "Maneuver_DV_CrossTrack_KmS | \n", "
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "60198.111433 | \n", "0.265156 | \n", "0.116136 | \n", "C1 | \n", "0 | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
| 1 | \n", "60198.184929 | \n", "-2.286885 | \n", "0.196002 | \n", "C1 | \n", "0 | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
| 2 | \n", "60198.258424 | \n", "-1.112096 | \n", "0.065560 | \n", "C1 | \n", "0 | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
| 3 | \n", "60198.331920 | \n", "-0.699248 | \n", "0.570814 | \n", "C1 | \n", "0 | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
| 4 | \n", "60198.405415 | \n", "0.658723 | \n", "0.124080 | \n", "C1 | \n", "0 | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 707 | \n", "60199.680953 | \n", "-8.950171 | \n", "54.328034 | \n", "B3 | \n", "1 | \n", "4 | \n", "60199.388352 | \n", "0.0 | \n", "0.051197 | \n", "0.742526 | \n", "
| 708 | \n", "60199.754449 | \n", "-16.675548 | \n", "43.210577 | \n", "B3 | \n", "1 | \n", "4 | \n", "60199.388352 | \n", "0.0 | \n", "0.051197 | \n", "0.742526 | \n", "
| 709 | \n", "60199.827944 | \n", "-28.241519 | \n", "21.618071 | \n", "B3 | \n", "1 | \n", "4 | \n", "60199.388352 | \n", "0.0 | \n", "0.051197 | \n", "0.742526 | \n", "
| 710 | \n", "60199.901440 | \n", "-41.482868 | \n", "-6.093304 | \n", "B3 | \n", "1 | \n", "4 | \n", "60199.388352 | \n", "0.0 | \n", "0.051197 | \n", "0.742526 | \n", "
| 711 | \n", "60199.974935 | \n", "-53.776856 | \n", "-31.163723 | \n", "B3 | \n", "1 | \n", "4 | \n", "60199.388352 | \n", "0.0 | \n", "0.051197 | \n", "0.742526 | \n", "
712 rows × 10 columns
\n", "