Meet kvcached (KV cache daemon): a KV cache open-source library for LLM serving on shared GPUs
KV cached is an open-source library that virtualizes KV cache memory for LLM serving on shared GPUs, addressing the problem of underutilized GPU memory in multi-model deployments. Using CUDA virtual memory, it enables elastic, on-demand allocation and reclamation of GPU memory pages, allowing multiple models to share resources
1 Comment
Sort: