A practical guide to deploying DeepSeek R1 distilled models (7B, 14B, 32B) locally using two paths: Ollama for quick single-user experimentation and vLLM with Docker for production serving. Covers hardware requirements and VRAM calculations across NVIDIA GPUs, Apple Silicon, and CPU-only setups; quantization format trade-offs between GGUF, AWQ, and GPTQ; step-by-step Ollama and Docker Compose configuration with code examples; OpenAI-compatible API integration; and performance optimization tips including Flash Attention 2, KV cache tuning, and Metal acceleration on macOS.
Table of contents
How to Deploy DeepSeek R1 LocallyTable of ContentsWhy Deploy DeepSeek R1 Locally?Hardware Requirements for DeepSeek R1 Local DeploymentPath 1: Deploying DeepSeek R1 with OllamaPath 2: Production Deployment with vLLM and DockerQuantization Options and Performance Trade-offsPerformance Optimization TipsQuick-Start Deployment ChecklistNext StepsSort: