A hands-on benchmark comparing MiniMax 2.5, Llama 3.1 (70B and 405B), DeepSeek-R1, and Qwen2.5-Coder 32B on four coding tasks—function generation, bug detection, refactoring, and multi-file context understanding—run locally on dual RTX 3090 GPUs using llama.cpp. Key findings: MiniMax 2.5 is the best all-rounder (1st or 2nd on 3 of 4 tasks, ~17.5 tok/s), Llama 3.1 405B produces the highest code quality but is too slow for interactive use (~7.8 tok/s), DeepSeek-R1 excels at debugging via chain-of-thought but is the slowest (~9.8 tok/s), and Qwen2.5-Coder 32B is the only single-GPU option with the fastest throughput (~36.8 tok/s) at lower quality. Llama 3.1 70B failed two of four tasks, making its quality drop from 405B larger than expected. Detailed VRAM requirements, scoring methodology, and actual model outputs are included.
Table of contents
MiniMax 2.5 vs Llama 3.1 vs DeepSeek-R1 ComparisonTable of ContentsWhy Local Coding Models Matter in 2026Models Under Test: Versions, Sizes, and QuantizationsBenchmark Methodology: Hardware, Prompts, and ScoringBenchmark Results: The Full ComparisonPerformance and Resource Usage ComparedAnalysis: Strengths, Weaknesses, and SurprisesWhich Model Should You Choose? Decision FrameworkThe State of Local Coding Models in 2026Sort: