According to Meta, memory layers may be the the answer to LLM hallucinations as they don't require huge compute resources at inference time.

VentureBeat is a leading source of news, analysis, and insights on technology innovation, startups, and venture capital. Covering topics such as AI, blockchain, gaming, and more, VentureBeat provides  reporting, interviews, and commentary on trends and developments shaping the tech industry. Entrepreneurs, investors, and technology enthusiasts can stay informed about the latest news, funding rounds, and market trends through VentureBeat's coverage.

Venture Beat

Meta AI researchers propose scalable memory layers to improve the factual knowledge and reduce hallucinations in large language models (LLMs) by enhancing their learning capacity without additional compute resources. These layers use sparse activations and key-value lookup mechanisms, making them more memory-intensive but compute-efficient. By implementing parallelization, CUDA kernels, and parameter-sharing mechanisms, the researchers successfully integrated these layers into existing LLMs. The memory-enhanced models demonstrated significant improvements in factual knowledge tasks and efficiency compared to dense and mixture of experts (MoE) models.

Meta proposes new scalable memory layers that improve knowledge, reduce hallucinations