Untitled

A practical benchmark comparing MiniMax2.5, Llama 3 (8B and 70B), Mistral Large 2 (123B), and Gemma 2 (9B and 27B) running locally via Ollama at Q4_K_M quantization on two GPU tiers (RTX 4090 and RTX 3060). Tests cover coding accuracy (Python and JavaScript Pass@1), reasoning, creative/chat quality, inference speed (tokens per second), and VRAM consumption. Key findings: MiniMax2.5 leads on JavaScript coding; Llama 3 70B tops Python accuracy and reasoning; Mistral Large 2 wins on chat quality but is slowest and most VRAM-hungry; Gemma 2 9B and Llama 3 8B are the only viable options for interactive use on 12 GB VRAM. Reproducible Python and Node.js benchmark scripts are provided, along with hardware-specific recommendations and a full comparison table.

#llm

#llama

#ollama

Mar 16•23m read time•From sitepoint.com

Table of contents

LLM Benchmarks 2026 Comparison Table of Contents Why Benchmark Local Models Yourself?Methodology — How We Tested The Contenders — Model Profiles Results — Coding Performance Results — Inference Speed Results — Memory and VRAM Usage Results — Reasoning and Creative Tasks The Verdict — Choosing the Right Model How to Run These Benchmarks Yourself

Comment

Bookmark

Copy

Sort: