Your agent analyzes images, transcribes audio, and processes PDFs. But when something goes wrong, your traces show nothing but opaque base64 strings: megabytes of iVBORw0KGgo... buried in JSON. You can see that an image was sent, but not what was in it. You can see audio was returned, but you can't play it. And every one of those multi-megabyte strings is stored directly in your trace database, bloating storage costs and slowing down queries.

mlflow

MLflow 3.11 introduces multimodal tracing, which automatically extracts binary content (images, audio, PDFs) from AI agent traces instead of storing raw base64 strings in the trace database. Binary data is offloaded to artifact storage (S3, Azure Blob, GCS, etc.) while spans retain only lightweight reference URIs. The feature works out of the box with autologging for OpenAI, Anthropic, Gemini, Bedrock, and LangChain, covering 8 multimodal data patterns. A manual Attachment API handles custom file types. The MLflow UI renders images as thumbnails, audio with inline playback controls, and PDFs in an embedded viewer, enabling proper debugging of vision and audio model behavior.

See What Your AI Sees: Multimodal Tracing for Images, Audio, and Files

Manual Attachments for Custom Content ​