Learn how to build reliable AI agents with our 8-stage evaluation framework. We explore DeepEval, multi-turn testing, and CI/CD integration for Red Hat AI.

Rhdev is a blog and resource hub dedicated to Ruby on Rails development, a popular web application framework written in Ruby. Developers can explore tutorials, best practices, and case studies for building web applications with Ruby on Rails. Additionally, Rhdev covers topics such as ActiveRecord ORM, RESTful APIs, and frontend integration using JavaScript frameworks, offering insights for both beginners and experienced Rails developers.

Red Hat Developer

A detailed walkthrough of an 8-stage evaluation framework for AI agents, developed while building an IT self-service agent quickstart on Red Hat OpenShift AI. Covers the progression from manual testing to automated multi-turn conversation evaluation using DeepEval, including custom metrics with LLM-as-judge, conversation generation, known-bad test cases, CI/CD integration, and cost tracking. Key insights include the need for capable evaluator models, the importance of testing your metrics against known failures, and practical token cost estimates for running evaluations at scale.

Eval-driven development: Build and evaluate reliable AI agents

Manual testing with a few predefined conversations