Running Your Coding Agent Locally: Lessons from a Real-World Experiment by S. Maestri and A. Soldano
A conference talk by S. Maestri and A. Soldano covering what it takes to run coding agents (specifically Claude Code) locally on your own hardware. Topics include: why you might want local inference (privacy, compliance, cost control, API stability), the toolchain (llama.cpp + LM Studio), model selection using benchmarks from artificialanalysis.io, understanding GGUF quantization formats and their accuracy tradeoffs, memory requirements including KV cache scaling, prefill vs decode performance phases, key llama.cpp configuration parameters, and Claude Code environment tweaks needed for local use. A real-world experiment compared Claude Opus 4 (15 min, single pass) against a local Qwen 3.5 122B Q4 model (48 min planning + 1 hr implementation) for building a Java dashboard app, with the local model producing comparable but slower results requiring more iteration.
Sort: