whichllm is a Python CLI tool that auto-detects your GPU/CPU/RAM and ranks the best local LLMs from HuggingFace that will actually run on your hardware. Unlike simple VRAM-fit tools, it uses recency-aware benchmark scores from sources like LiveBench, Artificial Analysis, Aider, and Chatbot Arena ELO to rank models by real performance rather than parameter count. It supports GPU simulation for purchase planning, one-command model download and chat via `whichllm run`, Python code snippet generation, Ollama integration, and JSON output for scripting. The scoring system accounts for quantization penalties, evidence confidence levels, partial offload, and MoE architecture specifics.

9m read timeFrom github.com
Post cover image
Table of contents
See itWhy whichllm?FeaturesRun & SnippetInstallUsageIntegrationsScoringHow it worksContributingRequirementsLicense
1 Comment

Sort: