A short guide on setting up Qwen3.6 35B A3B as a local FIM (fill-in-the-middle) autocomplete model using llama-server and the Zed editor. The post covers the llama-server launch command with recommended flags for VRAM efficiency, and the Zed configuration needed to enable edit predictions via the OpenAI-compatible API. The author notes completions work but tend to be too long due to limited post-processing in Zed, suggesting a proxy layer could improve results.

2m read timeFrom randomhacks.net
Post cover image

Sort: