Nearest Neighbor Speculative Decoding (NEST) is a new semi-parametric language modeling method that integrates real-world text spans into the generation of an existing LM, improving both quality and latency. NEST outperforms other methods in text completion and factuality aware generation tasks, and is 1.8 times faster in inference time without affecting attribution or fluency.
Sort: