Achieve up to 95% in cost savings using MLflow's new token-based judge pricing, and enhance domain-specific quality with our flexible judge support.

databricks

Databricks introduces token-based pricing for MLflow GenAI evaluation, reducing costs by up to 95% compared to fixed-block pricing. The platform now supports custom judges using any LLM provider (OpenAI, Anthropic, or fine-tuned models) and open-sources production-tested evaluation prompts validated across finance, healthcare, and technical documentation domains. Teams can evaluate agents across metrics like correctness, faithfulness, relevance, and safety while maintaining full control over evaluation logic and scaling to production workloads.

Build High-Quality, Domain-Specific Agents at 95% Lower Cost

New Token-Based Pricing Model for Predefined Judges

Open-Sourcing Battle-Tested Evaluation Prompts

Beyond Built-in Judges: Bring Your Own Model