The FACTS Benchmark Suite has been released to systematically evaluate factual accuracy of large language models across four dimensions: parametric knowledge, search-based retrieval, multimodal understanding, and grounding in context. Comprising 3,513 curated examples managed through Kaggle, the benchmark reveals that even
Sort: