The FACTS Benchmark Suite provides a systematic evaluation of Large Language Models (LLMs) factuality across three areas: Parametric, Search, and Multimodal reasoning.

DM provides a diverse range of content spanning technology, business, and culture, offering articles, interviews, and analysis for readers interested in staying updated with the latest trends and developments across various industries. Readers can learn about emerging technologies, industry insights, and  perspectives from experts in different fields.

DeepMind

Google DeepMind and Kaggle launched the FACTS Benchmark Suite to systematically evaluate LLM factual accuracy across four areas: parametric knowledge (internal knowledge recall), search-augmented retrieval, multimodal reasoning with images, and grounding (context-based answers). The suite contains 3,513 publicly available examples with private held-out test sets, and Kaggle hosts a public leaderboard tracking model performance using an aggregate FACTS Score across all benchmarks.

FACTS Benchmark Suite: a new way to systematically evaluate LLMs factuality