A practical guide to adding distributed tracing to multi-agent AI swarms using Jaeger v2 and OpenTelemetry. Covers setting up Jaeger v2 with Docker (including persistent Badger storage and permission fixes), wiring Claude Code lifecycle hooks to emit spans via Python, and a span hierarchy model that captures subagent activity, token counts, and tool calls. Includes lessons learned: correlating pre/post tool events without a shared ID, persisting span context across subprocess invocations via temp files, flush timeout tuning, and a full environment variable reference.
Table of contents
Table of ContentsWhat Is Distributed Tracing?Why Jaeger v2?PrerequisitesInstalling Docker on DebianSetting Up Jaeger v2Setting Up Claude Forge TracingUnderstanding the Span ModelInstrumenting a Multi-Agent SwarmViewing Traces in the Jaeger UILessons from the TrenchesEnvironment Variable ReferenceWrapping UpSort: