Find the local LLM that actually runs — and performs best — on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly. - Andyyyy64/whichllm

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

whichllm is a Python CLI tool that auto-detects your GPU/CPU/RAM and ranks the best local LLMs from HuggingFace that will actually run on your hardware. Unlike simple VRAM-fit tools, it uses recency-aware benchmark scores from sources like LiveBench, Artificial Analysis, Aider, and Chatbot Arena ELO to rank models by real performance rather than parameter count. It supports GPU simulation for purchase planning, one-command model download and chat via `whichllm run`, Python code snippet generation, Ollama integration, and JSON output for scripting. The scoring system accounts for quantization penalties, evidence confidence levels, partial offload, and MoE architecture specifics.

GitHub - Andyyyy64/whichllm: Find the local LLM that actually runs — and performs best — on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.

<p>I think <code>llmfit</code> is way more versatile and fast.</p>