LangChain's evaluation results show that open-weight models GLM-5 and MiniMax M2.7 now match closed frontier models (Claude Opus, GPT-5.4, Gemini) on core agentic tasks including file operations, tool use, and instruction following — at 8–10x lower cost and significantly lower latency. The post details the eval methodology used in the Deep Agents harness (correctness, solve rate, step ratio, tool call ratio), presents per-category benchmark data, and explains how to swap in open models with a one-line code change. It also covers harness-level abstractions that handle context window differences and tool-calling format variations, plus a CLI feature for runtime model swapping mid-session.
Table of contents
Why open modelsHow we evaluatedFindings from our evalsUsing open models in Deep Agents SDKWhat’s nextSort: