Docker Model Runner now supports vllm-metal, a new backend enabling vLLM inference on macOS with Apple Silicon's Metal GPU. Developed collaboratively by Docker and the vLLM project, vllm-metal unifies MLX and PyTorch under a single compute pathway, leveraging Apple Silicon's unified memory for zero-copy tensor operations and

7m read time From docker.com
Post cover image
Table of contents
What is vllm-metal?How vllm-metal worksWhich models work with vllm-metal?vLLM everywhere with Docker Model RunnerGet startedGiving Back: vllm-metal is Now Open SourceHow does vllm-metal compare to llama.cpp?

Sort: