AI systems break traditional QA assumptions because their outputs are probabilistic, context-dependent, and non-deterministic. Testing AI requires shifting from exact assertions to scored evaluations across dimensions like relevance, factual grounding, safety, and usefulness. Effective AI testing must cover both model-level and system-level evaluation, include adversarial and regression cases, treat prompt changes with the same rigor as code changes, and extend into production through continuous monitoring. Security concerns like prompt injection and compliance obligations (EU AI Act, GDPR) also fall within the testing scope. Human review remains a structural requirement alongside automated and LLM-as-a-judge evaluation layers.
Table of contents
1. Traditional software testing was built for deterministic systems2. What makes AI testing fundamentally different3. AI outputs are no longer simply "correct" or "incorrect"4. Why testing the AI model alone is not enough5. AI test design becomes behavior-oriented6. Data becomes a primary source of failure7. Security and compliance become part of the testing scope8. Human evaluation still matters9. AI systems require continuous testing after deploymentQA becomes reliability engineeringFAQSort: