A practical guide to running LLMs locally on a mini PC using an AMD Ryzen AI 395 (Strix Halo) processor with unified memory architecture. Covers hardware options and costs (from $2,100 mini PCs to $10,000 GPUs), key concepts like inference, VRAM requirements, and quantization, and walks through setting up Fedora, configuring GTT memory allocation, and running models via llama.cpp using a community toolbox. Also covers model selection from HuggingFace, using Open WebUI with web search, and integrating local AI into coding workflows via Continue and OpenCode. Concludes that local AI works well for basic Q&A and web search but still falls short of Claude Opus for complex agentic coding tasks without extensive guardrails like tests and spec-driven development.
Sort: