DeepSeek Gave LLMs a Real Memory (It's Not RAG)

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

DeepSeek's Engram module introduces a hash-based explicit memory lookup mechanism for transformers, distinct from RAG. Instead of computing factual knowledge through dense matrix multiplications, Engram uses a multiplicative XOR hash to index into large embedding tables, enabling direct fact retrieval for token n-grams. Multi-head hashing mitigates collisions across 8 independent tables. A context-aware gating mechanism (using query-key dot products with sigmoid activation) filters irrelevant lookups, and a short depthwise causal convolution widens the receptive field. Placed at transformer layer 2, Engram injects factual context before attention begins, freeing subsequent layers for reasoning. Experiments show the optimal parameter split is ~75-80% MoE and 20-25% Engram. Embedding tables can reside in CPU RAM and be prefetched before computation, avoiding GPU memory overhead. Ablations confirm factual tasks collapse to 29-44% performance when Engram is disabled, while reasoning tasks remain largely intact.

27m watch time

Sort: