NVIDIA's AI-Q deep research agent achieved first place on both DeepResearch Bench (55.95) and DeepResearch Bench II (54.50). The system uses a multi-agent architecture with three core components: an orchestrator, a planner (with Scout and Architect subagents), and a researcher (with five specialist subagents). It is built on NVIDIA NeMo Agent Toolkit and LangChain DeepAgents, powered by a custom fine-tuned Nemotron-3-Super-120B-A12B model trained on ~67k filtered SFT trajectories. Key engineering decisions include evidence-grounded planning, custom middleware for long-horizon reliability (tool name sanitization, reasoning-aware retry, budget enforcement, report validation), and an optional ensemble plus post-hoc refiner for maximum report quality. The stack is open, modular, and configurable via YAML.

9m read timeFrom huggingface.co
Post cover image
Table of contents
Why Winning Both Benchmarks MattersArchitecture at a GlanceCore Stack: NVIDIA and Deep ResearchAI-Q Deep ResearcherTakeaways

Sort: