Researchers have developed a method called StreamingLLM that allows chatbots to have continuous conversations without crashing or slowing down. By making a simple tweak to the key-value cache of large language models, the researchers found that the first few data points in the cache can be retained to prevent models from

5m read time From news.mit.edu
Post cover image

Sort: