Platform & AI
fabidick22's profile
Dickson A.@fabidick22•May 15
18.5K
Post cover image

GitHub - antirez/ds4: DeepSeek 4 Flash local inference engine for Metal

Avatar of hnHacker News•From github.com•May 07•15m read time

ds4.c is a narrow, Metal-only local inference engine specifically built for DeepSeek V4 Flash, a 284B MoE model. It is not a generic GGUF runner but a purpose-built engine with DS4-specific loading, KV state management, and an OpenAI/Anthropic-compatible HTTP server API. Key features include: 2-bit quantization that runs on MacBooks with 128GB RAM, a disk-based KV cache that persists session state across restarts, a 1M token context window, and integration with coding agents like Claude Code, opencode, and Pi. The disk KV cache design treats SSD storage as a first-class citizen for KV state, enabling long-context inference without keeping everything in RAM. Performance benchmarks show 26 t/s generation on an M3 Max MacBook Pro and 36 t/s on an M3 Ultra Mac Studio. The project was built with significant GPT-4.5 assistance and is explicitly alpha-quality, Metal-only (CPU path crashes macOS due to a VM bug), and only works with the custom GGUF files published by the author.

Sort:

fabidick22's user avatar
Dickson A.
@fabidick22
Joined Oct 6. 2023
18.5K

Would you recommend this post?

Copy link
WhatsApp
Facebook
X
New Squad
  • © 2026 Daily Dev Ltd.
  • Guidelines
  • Explore
  • Tags
  • Sources
  • Squads
  • Leaderboard