Project Case Study

Orbital Refueling Simulator

A synthetic spacecraft telemetry system for testing hybrid anomaly monitoring and grounded LLM explanations across mission phases.

Autonomous propellant transfer is a useful monitoring problem because several subsystems move at the same time: robotic alignment, docking loads, seal pressure, transfer pressure, flow rate, pump behavior, thermal state, and spacecraft attitude. Some faults are obvious threshold violations. Others only look suspicious when multiple signals drift together.

I built this project around that split. The simulator generates a complete 410-second refueling mission, deterministic rules own the hard-limit safety checks, and a phase-aware anomaly detector provides advisory scoring for multivariate drift. I then added RefuelGuard-LM, a fine-tuned explainer layer that turns grounded rule, score, and attribution outputs into concise operator-facing explanations. The result is a small but complete telemetry pipeline with scenario generation, scoring, attribution, LLM fine-tuning, regression tests, and model cards that clearly state the limits of synthetic evaluation.

Python
scikit-learn
IsolationForest
LLM Fine-Tuning
LoRA
pandas
pytest
Telemetry Simulation

Demo → GitHub → Portfolio Overview →

Orbital refueling telemetry dashboard preview with mission phase, scenario alerts, signal charts, and anomaly score

Problem

A single anomaly detector is the wrong mental model for this domain. During approach, zero propellant flow is normal. During main transfer, zero flow is suspicious. A pressure value that looks reasonable in one phase can be wrong in another. The monitoring system needs phase context before it can decide whether telemetry is nominal.

The other constraint is authority. If bus voltage drops too low, pump current spikes, or seal pressure falls after the seal is established, a deterministic rule should fire regardless of what an ML model thinks. The model can help surface softer patterns, but it should not override hard engineering checks.

Detection challenge

Represent faults that are clear threshold breaches, such as leak-check flow, and faults that are mainly multivariate drift, such as pressure sensor bias or propellant slosh.

Design principle

Keep deterministic safety rules independent from ML scoring. Rules make explicit engineering checks; ML provides advisory pattern recognition and supporting evidence.

Simulator Design

The core of the project is simulator.py. It creates a reproducible mission timeline with phase-specific nominal behavior, then injects scenario anomalies at the operational phase where they make sense.

Phase model: the mission moves through approach, arm alignment, docking, seal check, pressure equalization, main transfer, leak check, disconnect, and retreat.
Physical consistency: the generator tracks mass transfer and updates donor pressure, receiver pressure, line pressure, flow rate, pump current, temperatures, attitude, arm state, and interface loads with phase-specific behavior.
Scenario injection: each anomaly modifies the signals it should affect: partial blockage lowers flow while raising line pressure and pump current, arm misalignment raises end-effector error and interface loads, and unstable slosh oscillates flow, pressure, and propellant temperature.
Reproducibility: fixed seeds make the scenarios stable enough for tests and validation reports while still preserving realistic noise.

Hybrid Monitoring Architecture

I split monitoring into two independent layers so the behavior is easier to reason about and safer to explain.

Deterministic rule engine

rules.py defines explicit warning and critical thresholds for attitude error, bus voltage, seal pressure, leak-check flow, line pressure, pump current, interface force, and arm position. Rules can be scoped to phases so expected ramps do not create false alerts.

Event grouping

Raw timestamp-level alerts are grouped into event windows by rule and phase. This prevents oscillating signals from flooding the operator view with dozens of nearly identical rows.

Phase-aware ML detector

detector.py trains one IsolationForest per phase on nominal telemetry. At inference time, each row is scored against the model for its current phase instead of a global baseline.

Score normalization

The detector maps threshold-relative IsolationForest outputs into a 0 to 1 anomaly score with a sigmoid transform. The score is useful for ranking and inspection, not as a calibrated probability.

Explainability And Validation

Because this is a monitoring system, a high score by itself is not enough. The project includes lightweight attribution, an optional fine-tuned explanation model, and a repeatable validation script so each scenario has inspectable evidence.

Perturbation attribution: for anomalous rows, explainer.py replaces each feature with that phase's nominal mean and measures how much the anomaly score drops. Large drops identify the signals pushing the score upward.
Scenario validation: scripts/validate_scenarios.py trains or loads the default detector, evaluates every scenario, and writes a compact CSV with max score, mean score, anomaly rate, rule alert counts, highest severity, and top contributing signal.
Regression tests: pytest coverage checks telemetry generation, rule behavior, detector scoring, phase handling, and explanation output, including cases like unknown phases scoring to zero.
Model cards: MODEL_CARD.md and the RefuelGuard-LM model card document training data, intended use, score interpretation, limitations, and the validation needed before anything like this could be considered for real telemetry.

Fine-Tuned Explainer Layer

RefuelGuard-LM is an optional research extension under llm/. It fine-tunes a small open-source instruction model to explain synthetic telemetry anomalies from structured monitoring outputs rather than raw, unconstrained prompts.

Grounded inputs

The prompt payload includes mission phase, scenario context, deterministic rule alerts, advisory ML score, top attribution signals, and current telemetry values. The LLM does not detect anomalies or set alert severity.

Instruction data

llm/data/generate_instruction_data.py creates deterministic JSONL examples for explanation, classification, attribution, rule-vs-ML distinction, and uncertainty wording tasks.

LoRA training

The training path uses Hugging Face Transformers and PEFT to tune a Qwen2.5-0.5B-Instruct adapter, with optional 4-bit loading for constrained GPU environments.

Evaluation result

On 100 held-out synthetic examples, the fine-tuned adapter scored 4.86 out of 5 on a deterministic rubric, compared with 3.69 out of 5 for the base model.

Scenario Results

The validation story is designed to show why both monitoring layers exist. Some injected faults trigger explicit rules. Others stay under hard thresholds but still look unusual to the phase-aware detector.

Key pattern: partial blockage triggers both deterministic pressure/current rules and elevated ML scores, while sensor drift and unstable slosh can raise ML anomaly rates without grouped rule alerts. That is the architectural point of the project.

Nominal

The baseline stays quiet, with no grouped rule alerts and low average anomaly scoring. This gives the validation script a stable control case.

Partial blockage

Flow drops while line pressure and pump current rise, so both the rule engine and ML detector have reason to flag the scenario.

Sensor drift

Pressure readings drift upward over the mission. The model sees the pattern, but hard rules can remain quiet because no explicit threshold is crossed.

Unstable slosh

Flow, line pressure, and propellant temperature oscillate together during transfer, making the drift multivariate rather than a single obvious hard-limit breach.

Engineering Takeaways

The strongest part of the project is not the dashboard. It is the system boundary between simulation, rule authority, advisory ML, explanation, and validation.

Phase conditioning matters: telemetry should be scored against the correct operating context, especially when signals intentionally change meaning between phases.
Rules and ML have different jobs: hard-limit alerts are transparent and deterministic, while the ML layer is useful for surfacing softer patterns that deserve review.
LLMs need grounded authority boundaries: the fine-tuned explainer consumes upstream rule and ML outputs, but it does not replace detection, severity assignment, or safety decisions.
Validation should be explicit: synthetic projects need scenario-level evidence, repeatable seeds, regression tests, and clear limitations so the reader can separate demo value from real-world claims.
Explainability can be lightweight: perturbation-based attribution is not causal proof, but it gives users a useful first answer to "which signals made this look anomalous?"

Limitations

This is an educational prototype. It is not flight software, not an autonomous abort system, and not based on real spacecraft telemetry. The important engineering habit here is being explicit about those boundaries.

Synthetic data comes from one simplified generator, so results do not imply real-world generalization.
IsolationForest scores each timestep independently and does not model temporal sequences directly.
RefuelGuard-LM was trained and evaluated on template-generated synthetic explanations, not human-validated spacecraft operations data.
Rule thresholds are illustrative, not derived from spacecraft engineering documentation.
The attribution method assumes feature independence and should be treated as approximate inspection support, not a diagnosis.