Trust in AI systems doesn't come from understanding the models themselves, but from building robust observability systems around them. Using OpenTelemetry, developers can instrument AI workloads across four layers: development tooling (tracking AI coding assistants like Claude Code), operational metrics (token usage, cost, finish reasons), decision tracing (end-to-end agentic loop traces with custom attributes explaining why decisions were made), and quality monitoring (using small language models as evaluators for hallucinations, toxicity, and policy violations). Real-time guardrails can block harmful inputs and outputs before they reach users. The key insight is that traditional monitoring signals are insufficient for AI applications — you need to trace decision paths, not just inputs and outputs, and instrument specifically for AI workloads using OpenTelemetry semantic conventions.

31m watch time

Sort: