Run the new Qwen3.5 LLMs including Medium: Qwen3.5-35B-A3B, 27B, 122B-A10B, Small: Qwen3.5-0.8B, 2B, 4B, 9B and 397B-A17B on your local device!

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

Step-by-step guide for running Qwen3.5 LLMs locally across all model sizes (0.8B to 397B-A17B) using llama.cpp and LM Studio. Covers building llama.cpp with CUDA support, downloading GGUF quantized models from Unsloth's HuggingFace repo, and running inference with recommended sampling parameters for both thinking and non-thinking modes. Also includes serving models via llama-server with OpenAI-compatible API, enabling/disabling chain-of-thought reasoning, and tool calling setup.

Qwen3.5 - How to Run Locally Guide