Learn about the vLLM framework that enhances the inference speed of language models by introducing paged attention. Also discover TGI, another technique for increasing LLM inference speed that offers tensor parallelism and dynamic batching.

2m read time From blog.gopenai.com
Post cover image

Sort: