Monitoring Cortex Agent Performance Using Trace Data

Snowflake Cortex Agents emit rich trace data via the SNOWFLAKE.LOCAL.GET_AI_OBSERVABILITY_EVENTS table function, covering span types like chat, planning, response generation, and tool calls. Key performance metrics to monitor include token consumption, duration, and status codes at the span level. Common failure patterns include token spikes from multi-turn context accumulation or retrieval config changes, and usage volatility that can mask elevated error rates. Combining signals across span types — e.g., high planning tokens alongside low tool-call completion rates — enables faster root cause analysis. The post also introduces Monte Carlo's agent observability integration with Snowflake Intelligence for automated monitoring at scale.

#llm

#ai-agents

#observability

#snowflake

Apr 20•12m read time•From medium.com

Table of contents

How to identify and fix agent performance issues using Cortex Agent telemetry Observing your Cortex Agent fleet What Snowflake logs for Cortex Agents: a look under the hood Mapping records to span types Get Michael Segner’s stories in your inbox Key performance metrics to monitor Total tokens Duration Status Codes Combining Signals Common agent performance issues Token spikes: what they look like and what causes them Usage volatility The underlying principle

Comment

Bookmark

Copy

Sort: