In this video CJ guides you through the wide world of local AI. He shows how he set up his new 128GB memory mini PC and gives his overall impressions of running local models for basic tasks and coding.

00:00 - intro
01:03 - key terms / concepts
01:40 - we need gpus
02:16 - LLMs need a lot of VRAM
03:20 - model quantization
04:13 - where to find models
05:13 - GPU pricing
06:33 - unified memory PCs
10:58 - unboxing the mini pc
12:13 - fedora install / setup
13:28 - GTT memory configuration
14:04 - how to run local models
16:33 - how to download and find models
18:54 - impressions of basic prompting
20:31 - coding overall impressions
23:21 - conclusions

GMKtec EVO-X2 AI Mini PC | https://amzn.to/4bbgdMg

What is LLM Inference | https://huggingface.co/blog/Kseniase/

Large Language Models explained briefly | https://www.youtube.com/watch?v=LPZh9BOjkQs&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&index=5

What is GPU Memory and Why it Matters | https://www.bentoml.com/blog/what-is-gpu-memory-and-why-it-matters-for-llm-inference

How much VRAM do I need for LLM inference? | https://modal.com/blog/how-much-vram-need-inference

A Visual Guide to Quantization | https://www.maartengrootendorst.com/blog/quantization/

Unsloth Models | https://unsloth.ai/docs/models/tutorials

NVIDIA DGX Spark | https://www.nvidia.com/en-us/products/workstations/dgx-spark/

Strix Halo Wiki | https://strixhalo.wiki/Hardware/PCs

Increasing the VRAM allocation on AMD AI APUs under Linux | https://www.jeffgeerling.com/blog/2025/increasing-vram-allocation-on-amd-ai-apus-under-linux/

Ollama | https://ollama.com/

LM Studio | https://lmstudio.ai/

vLLM | https://docs.vllm.ai/en/latest/

llama.cpp | https://github.com/ggml-org/llama.cpp

AMD Strix Halo Toolboxes | https://github.com/kyuz0/amd-strix-halo-toolboxes

Hugging Face CLI | https://huggingface.co/docs/huggingface_hub/guides/cli

Hugging Face llama.cpp Models | https://huggingface.co/models?apps=llama.cpp&sort=trending

Spec Kit | https://github.github.com/spec-kit/

OpenSpec | https://openspec.dev/

------------------------------------------------------------------------------
Hit us up on Socials!
Syntax: https://x.com/syntaxfm
Scott: https://x.com/stolinski
Wes: https://x.com/wesbos
CJ: https://x.com/CodingGarden
Randy: https://www.youtube.com/@randyrektor

http://www.syntax.fm

Brought to you by Sentry.io

#programming #llm #llama #syntax #syntaxfm

YouTube

A practical guide to running LLMs locally on a mini PC using an AMD Ryzen AI 395 (Strix Halo) processor with unified memory architecture. Covers hardware options and costs (from $2,100 mini PCs to $10,000 GPUs), key concepts like inference, VRAM requirements, and quantization, and walks through setting up Fedora, configuring GTT memory allocation, and running models via llama.cpp using a community toolbox. Also covers model selection from HuggingFace, using Open WebUI with web search, and integrating local AI into coding workflows via Continue and OpenCode. Concludes that local AI works well for basic Q&A and web search but still falls short of Claude Opus for complex agentic coding tasks without extensive guardrails like tests and spec-driven development.

You Guide To Local AI | Hardware, Setup and Models