A new industry benchmark aimed at systematically evaluating the factual accuracy of LLMs has been released with the launch of the FACTS Benchmark Suite. Developed by the FACTS team in collaboration wi

InfoQ is a leading online platform for software developers, architects, and technical leaders, providing news, articles, presentations, and interviews on a wide range of topics, including agile practices, DevOps, microservices, and emerging technologies. With a focus on quality content and expert insights, InfoQ helps professionals stay informed about the latest trends, best practices, and industry developments. Developers can learn from real-world experiences, gain  knowledge, and connect with peers in the global software community through InfoQ's diverse and engaging content.

InfoQ

The FACTS Benchmark Suite has been released to systematically evaluate factual accuracy of large language models across four dimensions: parametric knowledge, search-based retrieval, multimodal understanding, and grounding in context. Comprising 3,513 curated examples managed through Kaggle, the benchmark reveals that even top-performing models like Gemini 3 Pro achieve only 68.8% overall accuracy, with multimodal factuality proving particularly challenging. The suite provides a standardized framework for measuring how reliably LLMs produce factually correct responses in real-world usage scenarios.

FACTS Benchmark Suite Introduced to Evaluate Factual Accuracy of Large Language Models