Docker Model Runner now supports vllm-metal, a new backend enabling vLLM inference on macOS with Apple Silicon's Metal GPU. Developed collaboratively by Docker and the vLLM project, vllm-metal unifies MLX and PyTorch under a single compute pathway, leveraging Apple Silicon's unified memory for zero-copy tensor operations and
•7m read time• From docker.com
Table of contents
What is vllm-metal?How vllm-metal worksWhich models work with vllm-metal?vLLM everywhere with Docker Model RunnerGet startedGiving Back: vllm-metal is Now Open SourceHow does vllm-metal compare to llama.cpp?Sort: