A comprehensive framework for implementing LLM observability in production systems. Covers the architecture, telemetry models, and KPIs needed to monitor large language model applications across reliability, quality, safety, cost, and governance dimensions. Explains how to trace multi-step agent workflows, implement guardrails,
Table of contents
What is LLM observabilityReference architectureTelemetry model and schemaObservability KPIs for LLM systemsTracing and logging patternsAgent observability: measuring tools, plans, and outcomesGuardrails and evaluationsAI cost observabilityIncident response for LLM systemsImplementation roadmapPortkey’s approach to LLM observabilitySort: