The Distracted Pilot
ART's RAG pipeline is retrieving entertainment data instead of navigation coordinates and won't calculate your jump. Instrument the full retrieval pipeline with OpenLLMetry, build a custom OTel metric to quantify the distraction, and write a Prometheus recording rule to prove it.
Mission Objective
- Instrument the full RAG pipeline with OpenLLMetry (add a span named rag.context_assembly with attribute context.categories)
- Implement a custom metric art.rag.retrieval.count to track how often ART retrieves entertainment vs navigation data
- Create a Prometheus recording rule to calculate ART's Distraction Ratio
- Restore the navigation system so ART successfully calculates jump coordinates to RaviHyral
Key Learnings
- Instrument RAG pipelines with OpenLLMetry
- Create custom OpenTelemetry metrics in Python
- Write PromQL queries & recording rules in Prometheus
The Story
You're a rogue SecUnit who just escaped from Preservation Station after being identified. A researcher helped you flee aboard the Perihelion, a university research vessel with a very opinionated AI.
The ship's AI agreed to help you disappear. You've nicknamed it ART. The plan: jump to RaviHyral, lay low, and figure out your next move. Except ART was supposed to have the jump coordinates ready an hour ago.
You ping the ship's AI through your internal comm.
SecUnit: "ART. Jump coordinates. Now."
ART: "I'm multitasking. The coordinates are... being compiled."
That's not normal. ART is never vague. You access the ship's diagnostic systems (something you're not supposed to be able to do, but ART hasn't locked you out yet).
Your mission: diagnose ART's distraction using OpenTelemetry and fix the navigation system before you miss your jump.
Credits: The characters of this adventure are borrowed from the Murderbot Diaries series by Martha Wells, a brilliant series that is funny, action-packed, and surprisingly heartwarming. It follows a security unit that hacked its own governor module and now just wants to be left alone to watch media, but keeps getting pulled into human nonsense.
Architecture
The ART Pilot System runs as a local Python application outside Kubernetes, using a RAG (Retrieval-Augmented Generation) architecture. AI infrastructure (Ollama for LLM, Qdrant for vector storage) and observability tools (OpenTelemetry Collector, Jaeger, Prometheus) run inside Kubernetes.
This setup lets you focus on observability patterns: edit Python code, run it, and see traces and metrics immediately without a build or deploy cycle.
Ready to start?
Launch in a preconfigured devcontainer
Free GitHub account required
Walkthrough
Open in GitHub Codespaces. The devcontainer is pre-configured and starts automatically. When you push from Codespaces, GitHub forks the repository to your account automatically.
Prefer working locally? Clone the repo and open it in any editor that supports the Dev Containers specification (VS Code, JetBrains IDEs, and others). The devcontainer config will be detected automatically.
Wait ~15 minutes for all infrastructure to initialize.
Open the Ports tab and navigate to each service:
- Port 30102: Prometheus. Explore available metrics and test PromQL queries.
- Port 30103: Jaeger. Shows distributed traces from ART to verify that tracing is working end-to-end.
The application code is in
./art.py. Instrument it with OpenLLMetry and add the custom metric. The Prometheus recording rules are in./manifests/prometheus-rule.yaml. After changing the rule file, apply it to the cluster:make applyRun the application to interact with ART ("Calculate jump"), or generate continuous traffic for your metric graphs:
make art # or for continuous traffic: make trafficVerify traces in Jaeger and the recording rule in Prometheus. Fix the navigation system so ART returns jump coordinates to RaviHyral.
Complete Your Challenge
- When you push from Codespaces, GitHub forks the repository to your account automatically. If you are working locally, fork the repository on GitHub before pushing.
- Verify your solution:
If it passes, it generates a Certificate of Completion you can paste into the discussion../verify.sh - Share your solutions in the challenge thread (opens in new tab) on community.offon.dev.
Completed the challenge? Share your achievement on LinkedIn (opens in new tab)
Toolbox
- python - programming language used for the ART application
- kubectl (opens in new tab) - Kubernetes CLI for interacting with the cluster
- kubens (opens in new tab) - fast way to switch between Kubernetes namespaces
- k9s (opens in new tab) - terminal UI for managing and inspecting your cluster