A new study presents Infini-gram, a large n-gram language model that can complement neural LLMs and reduce their perplexity. The model is trained at scale with 5 trillion tokens and allows for arbitrarily large n-grams.
Sort: