Step-by-step guide for running Qwen3.5 LLMs locally across all model sizes (0.8B to 397B-A17B) using llama.cpp and LM Studio. Covers building llama.cpp with CUDA support, downloading GGUF quantized models from Unsloth's HuggingFace repo, and running inference with recommended sampling parameters for both thinking and non-thinking modes. Also includes serving models via llama-server with OpenAI-compatible API, enabling/disabling chain-of-thought reasoning, and tool calling setup.
Sort: