Universal Transformer Memory uses neural networks to determine which tokens in the LLM's context window are useful or redundant.

VentureBeat is a leading source of news, analysis, and insights on technology innovation, startups, and venture capital. Covering topics such as AI, blockchain, gaming, and more, VentureBeat provides  reporting, interviews, and commentary on trends and developments shaping the tech industry. Entrepreneurs, investors, and technology enthusiasts can stay informed about the latest news, funding rounds, and market trends through VentureBeat's coverage.

Venture Beat

Researchers at Sakana AI have developed 'Universal Transformer Memory,' a technique using Neural Attention Memory Models (NAMMs) to optimize language models by keeping pertinent information and discarding redundant data. This innovation helps enterprises reduce the cost and improve the efficiency of applications built on large language models. NAMMs are trained separately and can be applied to various models, enhancing their performance on different tasks while saving up to 75% of cache memory. The technique is particularly useful for enterprises dealing with large amounts of data.

New LLM optimization technique slashes memory costs up to 75%