#VDZ26 
Running Your Coding Agent Locally: Lessons from a Real-World Experiment by Stefano Maestri and Alessio Soldano

Cloud-based coding assistants like Claude Code or GitHub Copilot are powerful—but what happens when you try to bring that experience fully on-premise?
In this talk, we’ll explore the practical journey of building and running a local AI coding setup: choosing models, hosting them on consumer hardware, connecting frontends like LM Studio, and evaluating what really works (and what doesn’t).We’ll discuss trade-offs in latency, memory, and tool integration, the role of KV cache and model routing, and how far open-source models can go in replicating commercial AI dev environments.
Expect a mix of architecture insights, debugging war stories, and honest conclusions about what’s currently feasible—and what’s still wishful thinking—when it comes to local AI coding.

Devoxx

A conference talk by S. Maestri and A. Soldano covering what it takes to run coding agents (specifically Claude Code) locally on your own hardware. Topics include: why you might want local inference (privacy, compliance, cost control, API stability), the toolchain (llama.cpp + LM Studio), model selection using benchmarks from artificialanalysis.io, understanding GGUF quantization formats and their accuracy tradeoffs, memory requirements including KV cache scaling, prefill vs decode performance phases, key llama.cpp configuration parameters, and Claude Code environment tweaks needed for local use. A real-world experiment compared Claude Opus 4 (15 min, single pass) against a local Qwen 3.5 122B Q4 model (48 min planning + 1 hr implementation) for building a Java dashboard app, with the local model producing comparable but slower results requiring more iteration.

Running Your Coding Agent Locally: Lessons from a Real-World Experiment by S. Maestri and A. Soldano