Learn how to deploy low-cost open source AI technologies at scale with Kubernetes for generative AI applications, using alternatives to OpenAI and running vLLM locally and on Kubernetes.

12m read time From opensauced.pizza
Post cover image
Table of contents
Running open source inference engines locallyChoosing vLLM for productionRunning vLLM locallyUsing Kubernetes for a large scale vLLM serviceGetting the cluster readyDeploying a vLLM DaemonSetLoad balancing with an internal Kubernetes serviceResultsConclusion
4 Comments

Sort: