NVIDIA's AI-Q deep research agent achieved first place on both DeepResearch Bench (55.95) and DeepResearch Bench II (54.50). The system uses a multi-agent architecture with three core components: an orchestrator, a planner (with Scout and Architect subagents), and a researcher (with five specialist subagents). It is built on NVIDIA NeMo Agent Toolkit and LangChain DeepAgents, powered by a custom fine-tuned Nemotron-3-Super-120B-A12B model trained on ~67k filtered SFT trajectories. Key engineering decisions include evidence-grounded planning, custom middleware for long-horizon reliability (tool name sanitization, reasoning-aware retry, budget enforcement, report validation), and an optional ensemble plus post-hoc refiner for maximum report quality. The stack is open, modular, and configurable via YAML.
Table of contents
Why Winning Both Benchmarks MattersArchitecture at a GlanceCore Stack: NVIDIA and Deep ResearchAI-Q Deep ResearcherTakeawaysSort: