Meet kvcached (KV cache daemon): a KV cache open-source library for LLM serving on shared GPUs
KV cached is an open-source library that virtualizes KV cache memory for LLM serving on shared GPUs, addressing the problem of underutilized GPU memory in multi-model deployments. Using CUDA virtual memory, it enables elastic, on-demand allocation and reclamation of GPU memory pages, allowing multiple models to share resources efficiently. The library integrates with mainstream engines and reports 1.2x to 28x faster time-to-first-token in multi-LLM scenarios by replacing static memory reservations with dynamic allocation.
1 Comment
Sort: