A hands-on benchmark comparing MiniMax 2.5, Llama 3.1 (70B and 405B), DeepSeek-R1, and Qwen2.5-Coder 32B on four coding tasks—function generation, bug detection, refactoring, and multi-file context understanding—run locally on dual RTX 3090 GPUs using llama.cpp. Key findings: MiniMax 2.5 is the best all-rounder (1st or 2nd on 3
Table of contents
MiniMax 2.5 vs Llama 3.1 vs DeepSeek-R1 ComparisonTable of ContentsWhy Local Coding Models Matter in 2026Models Under Test: Versions, Sizes, and QuantizationsBenchmark Methodology: Hardware, Prompts, and ScoringBenchmark Results: The Full ComparisonPerformance and Resource Usage ComparedAnalysis: Strengths, Weaknesses, and SurprisesWhich Model Should You Choose? Decision FrameworkThe State of Local Coding Models in 2026Sort: