Satisfice is a publication authored by James Bach, a software testing expert and advocate for exploratory testing. Readers can explore articles covering topics such as software testing techniques, test management, and quality assurance practices. Additionally, they can learn about critical thinking in testing, risk-based testing approaches, and improving software quality.

Satisfice

Testing large language models presents unique challenges compared to traditional software testing. LLMs are expensive to test thoroughly, behave inconsistently, and create unbounded regression testing problems. Common GenAI demos are unreliable because they show single runs without careful output analysis. The LARC (LLM Aggregated Retrieval Consistency) methodology tests self-consistency by repeatedly prompting models to retrieve information from text and checking for contradictions across multiple runs, revealing reliability issues in basic retrieval tasks.

Seriously Testing LLMs