Best of ai-agents — November 2025

1
Article
databricks·27w
Building Custom LLM Judges for AI Agent Accuracy
MLflow introduces three new capabilities for evaluating AI agents: Tunable Judges for creating custom LLM evaluators using natural language instructions, Agent-as-a-Judge for automatically identifying relevant trace data without manual parsing, and Judge Builder for visual judge management with domain expert feedback. These tools enable teams to build domain-specific evaluation criteria, align judges with human feedback through continuous tuning, and scale quality assessment from prototype to production. The make_judge SDK simplifies creating custom judges, while alignment optimization incorporates subject matter expert feedback to improve evaluation accuracy over time.
55
2
2
Article
Aishwary Gupta·25w
OpenAI dropped a cookbook on Self-Evolving Agents
OpenAI released a comprehensive cookbook featuring open-source examples and tutorials for building applications with their API. The collection covers fundamental API usage through advanced implementations including fine-tuning, RAG, function calling, vector databases, multimodal applications, and self-evolving agent development. Practical guides span GPT models, embeddings, image generation, speech processing, and platform integrations.
43
3
Video
Theo - t3․gg·26w
Anthropic admits that MCP sucks
Anthropic published guidance showing that code execution is 98.7% more efficient than their Model Context Protocol (MCP) specification for AI agents. The article demonstrates how writing code to interact with MCP servers reduces token usage from 150,000 to 2,000 tokens by avoiding context window bloat from tool definitions and intermediate results. This approach enables on-demand tool loading, data filtering before reaching the model, and better privacy controls, though it requires secure sandboxed execution environments.
38
2
4
Video
JavaScript Mastery·27w
Build AI Agents with n8n | Complete Beginner’s Automation Course 2025
A comprehensive guide to building automation workflows and AI agents using n8n, an open-source visual automation platform. Covers installation options (local, self-hosted, cloud), core concepts like nodes and triggers, and walks through building two practical projects: a weather forecast emailer and an intelligent inbox manager that automatically categorizes emails, creates tasks, and drafts replies using AI models like Google Gemini.
23
5
Article
Zed·27w
Introducing Agent Extensions — Zed's Blog
Zed introduces Agent Server Extensions, allowing one-click installation of ACP-compatible AI coding agents directly in the editor. Three agents are available now: Auggie from Augment Code, OpenCode, and Stakpak. The extensions handle automatic downloads and provide menu integration for starting agent threads. Developers can create their own agent extensions by adding an extension.toml file, an SVG icon, and publishing through Zed's standard process. This builds on the Agent Client Protocol ecosystem, which has grown to include multiple agents and IDE clients including JetBrains.
20
1
6
Article
Daily Dose of Data Science | Avi Chawla | Substack·26w
Agent Protocol Landscape
Three emerging protocols are standardizing the fragmented AI agent ecosystem: AG-UI for agent-user interaction in frontends, MCP (Model Context Protocol) for connecting agents to tools and data, and A2A for multi-agent coordination. These protocols work as complementary layers rather than competing standards, with frameworks like CopilotKit providing a unified interface to build with all three. The convergence enables seamless integration between agentic backends, frontends, tools, and multi-agent systems through open-source implementations.
19
7
Article
The New Stack·28w
OpenAI Co-Founder: AI Agents Are Still 10 Years Away
OpenAI co-founder Andrej Karpathy predicts AI agents are still a decade away from replacing human workers, despite recent progress with large language models. He argues the industry is over-hyping current capabilities, citing issues like lack of multimodal functionality, continual learning, and the significant demo-to-product gap. Karpathy draws from his experience leading Tesla's self-driving efforts to illustrate how difficult it is to move from working demos to production-ready systems. He's now focusing on AI education through Eureka Labs, releasing projects like nanochat to help developers understand LLM implementation from the ground up.
19
1

See all ai-agents archives