The Noise Filter

Expert

ART is flooding Jaeger with 40,000 non-standard spans an hour. Fix the chat span to follow OpenTelemetry GenAI semantic conventions with proper token usage attributes, then configure tail sampling in the Collector to keep only traces that contain errors or exceed 5 seconds.

Mission Objective

Fix ART's chat span to follow OpenTelemetry GenAI semantic conventions, including token usage attributes
Configure tail sampling in the OpenTelemetry Collector to keep only traces that contain errors or take longer than 5 seconds

Key Learnings

OpenTelemetry GenAI semantic conventions
Tail sampling in the OTelOpenTelemetry Collector

The Story

You made it to RaviHyral. The Perihelion docked at Outpost Verada, a small independent research station run by a loose collective of academics who agreed to look the other way. In exchange, ART offered to share its observability data with the station's monitoring team.

That was three hours ago. Now the station's lead engineer is at your docking port, looking annoyed.

Engineer: "Your ship's AI is flooding our Jaeger instance. Do you have any idea how many spans it's generating? We can't find anything in there."

SecUnit: "ART."

ART: "Comprehensive telemetry is a feature."

Engineer: "It's 40,000 spans an hour. Every healthy query. Every token. It doesn't even follow conventions. We only care about failures and anomalies, the things that actually need attention."

SecUnit: "ART. Fix it."

ART: "...Fine."

The engineer hands you access to the collector config and the application code, then walks away. Two problems to fix. ART's spans don't follow OTel GenAI semantic conventions, and the collector is forwarding everything.

Credits: The characters of this adventure are borrowed from the Murderbot Diaries series by Martha Wells, a brilliant series that is funny, action-packed, and surprisingly heartwarming. It follows a security unit that hacked its own governor module and now just wants to be left alone to watch media, but keeps getting pulled into human nonsense.

Architecture

Same setup as the intermediate level: the ART Pilot System runs as a local Python application outside Kubernetes with a RAGRetrieval-Augmented Generation architecture. AI infrastructure (Ollama, Qdrant) and observability tools (OpenTelemetry Collector, Jaeger) run inside Kubernetes.

Ready to start?

Launch in a preconfigured devcontainerdevelopment container: a portable, reproducible coding environment defined by a configuration file

Open in Codespaces

Free GitHub account required

Walkthrough

Open in GitHub Codespaces. The devcontainer is pre-configured and starts automatically. When you push from Codespaces, GitHub forks the repository to your account automatically.

Prefer working locally? Clone the repo and open it in any editor that supports the Dev Containers specification (VS CodeVisual Studio Code, JetBrains, and others). The devcontainer config will be detected automatically.
Wait ~15 minutes for all infrastructure to initialize.
Open the Ports tab and navigate to each service:
- Port 30103: Jaeger. Verify your spans look correct and that tail sampling works as expected.
Fix two things:
1. The application code in ./art.py: update the chat span to follow OpenTelemetry GenAI semantic conventions, including token usage attributes.
2. The collector config in ./manifests/otel-collector-config.yaml: configure tail sampling to keep only traces that contain errors or take longer than 5 seconds.
After changing art.py, restart traffic to pick up new instrumentation. After changing the collector config, apply it:
```
kubectl apply -f manifests/otel-collector-config.yaml -n otel
kubectl rollout restart deployment/collector -n otel
```
Then generate traces:
```
make traffic
```
Verify in Jaeger that spans follow conventions and only errors and slow traces appear.

Complete Your Challenge

When you push from Codespaces, GitHub forks the repository to your account automatically. If you are working locally, fork the repository on GitHub before pushing.
Verify your solution:
```
./verify.sh
```
If it passes, it generates a Certificate of Completion you can paste into the discussion.
Share your solutions in the challenge thread on community.offon.dev.

Completed the challenge? Share your achievement on LinkedIn

Toolbox

python - programming language used for the ART application
kubectl - Kubernetes CLICommand Line Interface for interacting with the cluster
kubens - fast way to switch between Kubernetes namespaces
k9s - terminal UIUser Interface for managing and inspecting your cluster

Helpful Documentation

Know someone who'd enjoy this?