A detailed walkthrough of an 8-stage evaluation framework for AI agents, developed while building an IT self-service agent quickstart on Red Hat OpenShift AI. Covers the progression from manual testing to automated multi-turn conversation evaluation using DeepEval, including custom metrics with LLM-as-judge, conversation

Table of contents
About AI quickstartsOur evaluations journeyAn example conversationManual testing with a few predefined conversationsAutomated evaluationGenerating conversationsKnown bad conversationsThe complete flowCostWrapping upNext stepsTo learn moreSort: