One AI agent is a feature. Fifty agents is a distributed systems problem nobody's discussing. I've seen this pattern: teams build one agent, then five, then drown in coordination problems unrelated to LLMs. Agent handoffs fail silently. Data goes stale. Decisions become untraceable. Drawing from Databricks production deployments, I'll expose orchestration anti-patterns killing multi-agent systems and show agent handoff protocols that work—state management, data contracts, failure modes. You'll see when to choreograph versus orchestrate and live multi-agent workflow with proper observability. This applies distributed systems engineering to agents: the infrastructure layer everyone needs but nobody's building.

Sandipan Bhaumik - Data & AI Tech Lead, Databricks

Sandipan Bhaumik has spent 18 years building data and AI systems inside environments that can't afford them to fail - NHS, Tier 1 banks, and large enterprises across EMEA. At AWS and now Databricks, he's seen firsthand where multi-agent systems break down between architecture and production. He is a regular speaker on data and AI system architecutr ebest practices, runs a community of AI practitioners, and he's here to talk about what actually holds together when you scale agentic AI systems in production.

Socials:
https://www.linkedin.com/in/sandipanbhaumik

Slides:
https://drive.google.com/file/d/18LqVzhfVS3iULYuy2EshWoMLmQt3rdpT/view?usp=sharing

AI Engineer

A practitioner with 18 years of distributed systems experience shares hard-won lessons from deploying multi-agent AI systems in production. The talk covers why scaling from one to multiple agents creates exponential coordination complexity, illustrated by a real financial services race condition bug caused by stale cache reads. Key patterns covered include: choreography vs. orchestration (with a decision framework), immutable state snapshots with versioning to eliminate race conditions, data contracts between agents, circuit breaker patterns for failure isolation, and saga/compensation patterns for rollback. A reference production architecture using LangGraph, Databricks, Delta Lake, Unity Catalog, and MLflow is presented.

From Chaos to Choreography: Multi-Agent Orchestration Patterns That Actually Work — Sandipan Bhaumik