DeepSeek 4 Flash local inference engine for Metal. Contribute to antirez/ds4 development by creating an account on GitHub.

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

ds4.c is a narrow, Metal-only local inference engine specifically built for DeepSeek V4 Flash, a 284B MoE model. It is not a generic GGUF runner but a purpose-built engine with DS4-specific loading, KV state management, and an OpenAI/Anthropic-compatible HTTP server API. Key features include: 2-bit quantization that runs on MacBooks with 128GB RAM, a disk-based KV cache that persists session state across restarts, a 1M token context window, and integration with coding agents like Claude Code, opencode, and Pi. The disk KV cache design treats SSD storage as a first-class citizen for KV state, enabling long-context inference without keeping everything in RAM. Performance benchmarks show 26 t/s generation on an M3 Max MacBook Pro and 36 t/s on an M3 Ultra Mac Studio. The project was built with significant GPT-4.5 assistance and is explicitly alpha-quality, Metal-only (CPU path crashes macOS due to a VM bug), and only works with the custom GGUF files published by the author.