Researchers have developed a method called StreamingLLM that allows chatbots to have continuous conversations without crashing or slowing down. By making a simple tweak to the key-value cache of large language models, the researchers found that the first few data points in the cache can be retained to prevent models from failing. StreamingLLM has demonstrated significantly faster performance compared to other methods, making it suitable for various AI applications.
Sort: