GPU observability is fragmented across siloed tools — hardware metrics in one dashboard, inference metrics in another, cost in a spreadsheet. A framework of 8 connected layers is proposed: L1 GPU silicon, L2 CUDA/NCCL, L3 host/OS, L4 workload identity (Kubernetes/Slurm), L5 training, L6 inference, L7 GenAI semantics, and L8 business/cost. The key insight is that correlation across layers — enabled by OpenTelemetry shared resource attributes — reduces debugging time from 2 hours to 2 minutes. A concrete example traces a TTFT latency spike through thermal throttling on one GPU back to a failing fan sensor on the host node. The post also introduces l9gpu, an open-source OTel agent covering all 8 layers, and compares existing tools (Datadog, Grafana+DCGM, CloudWatch) which typically cover only 2-3 layers.
Table of contents
The 8 LayersWhy Correlation Matters More Than CoverageThe Bridge Layer: GPU-to-Workload IdentityWhat End-to-End Looks LikeThe Gap in Existing ToolsThe OpenTelemetry AdvantageWe built l9gpu to make this realThe seriesSort: