A testing professional shares empirical data from a LARC experiment testing four LLMs at different temperatures and prompt styles on a simple task: extracting apple pie ingredients from unstructured text. The experiment demonstrates the importance of evidence-based evaluation of AI systems before deployment, with raw data stored in MongoDB and analysis reports available online. The work is part of developing a 'Testers and AI' class and aims to establish rigorous testing methodologies for AI systems.
Sort: