AI Observability: Everything Is Unpredictable
Watch the full video: https://youtu.be/98yYcWwD95I

#Shorts

DevOps Toolkit's resource offers insights, tutorials, and resources for DevOps engineers and practitioners. Readers can learn about DevOps best practices, automation techniques, and tools for continuous integration and deployment. With articles, guides, and case studies, DevOps Toolkit provides  guidance and expertise for streamlining software delivery pipelines and improving collaboration between development and operations teams.

DevOps Toolkit

LLM-based AI systems are inherently non-deterministic — inputs, reasoning, and outputs all vary unpredictably, making traditional monitoring insufficient. Extending existing OpenTelemetry infrastructure with GenAI semantic conventions and LLM-specific instrumentation provides a unified trace view covering HTTP calls, database queries, LLM calls, and tool executions. For deeper AI-specific needs like prompt versioning, evaluation, and model comparison, tools like LangFuse, Arize Phoenix (OTel-native), and LangSmith add an extra layer. Key metrics to track include task success rate, number of tool calls per interaction, and interaction duration — without these, there's no way to know if an agent is genuinely helping users.