Best of ai-agents — April 2026

1
Article
Paolo Perrone·3w
Google just open-sourced agents-cli.
Google has open-sourced agents-cli, a command-line tool that turns coding agents like Claude Code, GeminiCLI, and OpenAI's CodexCLI into full agent engineers. With a single setup command, these agents gain the ability to scaffold projects, write ADK Python code, run evaluations, and deploy to Cloud Run — enabling a single natural language prompt to build and deploy a complete agent.
219
32
2
Article
Collections·4w
Claude Opus 4.7 announced with improved instruction-following and self-verification
Anthropic has released Claude Opus 4.7, featuring improvements in agentic coding, long-running task handling, and multimodal understanding. Key benchmarks include 64.3% on SWE-bench Pro and 87.6% on SWE-bench Verified, with early testers reporting 10–14% gains on coding tasks. Image support now handles up to 2,576px on the long edge. A new `xhigh` effort level has been added, and Claude Code's default effort has been raised from `medium` to `xhigh`, which may increase token usage. The context window is 1M tokens with pricing unchanged at $5/M input and $25/M output. A new tokenizer may increase token counts by 1.0–1.35x. The model is available across Claude plans, API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, Vercel AI Gateway, GitHub Copilot, Devin, v0, and more.
178
19
3
Article
InfoWorld·5w
Multi-agent is the new microservices
Multi-agent AI systems are being over-adopted in the same way microservices were — applied broadly before teams have problems that actually warrant the complexity. Anthropic, OpenAI, Microsoft, and Google all advise starting with the simplest solution: a single optimized LLM call, then retrieval, then tools, then a single agent loop. Only add a second agent when you can clearly identify parallelizable tasks, context pollution, or specialization needs. Multi-agent architectures cost significantly more in tokens, observability, error handling, and maintenance. Most enterprise teams don't yet have problems worth decomposing across agents, and adding agents won't fix weak retrieval, vague tools, or poor documentation — it will amplify those problems.
114
3
4
Article
Addy Osmani·5w
Agentic Engine Optimization (AEO)
Agentic Engine Optimization (AEO) is a new discipline for structuring technical documentation so AI coding agents can effectively discover, parse, and use it. Unlike human readers, agents like Claude Code, Cursor, and Cline compress multi-page navigation into single HTTP requests, bypass all client-side analytics, and silently discard content that exceeds their context windows. Key AEO practices include: auditing robots.txt to allow AI agent traffic, adding an llms.txt sitemap for agent discovery, writing skill.md files that declaratively describe API capabilities, keeping documentation pages under token limits (15K–25K tokens for most pages), serving clean Markdown alongside HTML, surfacing token counts as metadata, adding 'Copy for AI' buttons, and creating AGENTS.md files in repositories. A companion open-source audit tool called agentic-seo automates checking for these signals.
112
6
5
Article
Collections·2w
Warp terminal goes open-source under AGPL, with OpenAI as founding sponsor
Warp, the Rust-based AI terminal, has open-sourced its client code under a split license: MIT for UI framework crates and AGPL v3 for everything else. OpenAI is the founding sponsor, and agentic workflows run through Warp's proprietary cloud platform Oz. Key limitations include AI features still depending on Warp's backend, with bring-your-own-key model freedom locked behind paid plans. The community has already forked the project as OpenWarp to enable unrestricted OpenAI-compatible endpoints. CEO Zach Lloyd frames this as a competitive business move rather than a philosophical shift, aiming to attract developers who avoid proprietary tools while the developer tooling space consolidates rapidly following Roo Code's sunset and Cursor's acquisition.
174
22
6
Article
databricks·5w
How agentic software development will change databases
Agentic software development is reshaping database requirements in three key ways: evolutionary branching (agents create and discard database branches rapidly, with some projects reaching 500+ iterations), scale-to-zero economics (half of agentic app databases have compute lifetimes under 10 seconds, making fixed-cost databases unviable), and openness (agents trained on open-source ecosystems like Postgres operate more reliably with open interfaces and storage formats). Databricks's Lakebase addresses these needs with O(1) copy-on-write branching, sub-second scale-to-zero elasticity, and open Postgres page format storage on cloud object storage. A telling data point: AI agents in Lakebase now create roughly 4x more databases than human users.
101
7
Article
Collections·4w
Codex adds computer use, browser, image generation, and 90+ plugins
GPT-5.5 is now live in ChatGPT, Codex, and the API (as gpt-5.5 and gpt-5.5-pro), rolling out across GitHub Copilot, Cursor, Devin, and other tools. The model targets long-horizon agentic tasks, posting strong benchmark scores including 82.7% on Terminal-Bench 2.0 and 73.1% on Expert-SWE. Codex 3.0 ships alongside it with major new capabilities: computer use on Mac, an in-app browser, image generation, document support, memory, auto-review mode, SSH into devboxes, and 90+ new plugins. NVIDIA deployed Codex to 10,000+ employees on GB200 infrastructure. OpenAI launched a formal enterprise partner program with Cognizant, Accenture, Capgemini, and others. API pricing increased, though OpenAI argues improved task completion reduces total token usage. Early user reports highlight less boilerplate code, faster responses, and better task persistence without re-prompting.
79
2
8
Article
Where's Your Ed At·5w
AI Is Really Weird
A critical analysis of the AI industry's current state, arguing that the hype far outpaces reality. Key points include: AI 'agents' are fundamentally just chatbots connected to APIs with limited real-world capability; LLM-generated code creates security vulnerabilities and review backlogs rather than productivity gains; AI shows no meaningful presence in productivity data despite hundreds of billions in investment; Microsoft labels Copilot 'for entertainment purposes only' while selling it to governments; Anthropic and OpenAI use non-standard accounting to obscure massive losses, with Anthropic's rapid revenue growth figures appearing mathematically inconsistent with its CFO's sworn testimony of $5 billion in lifetime revenue; and mainstream media largely ignores or normalizes these financial red flags.
66
3
9
Article
Zed·3w
Introducing Parallel Agents in Zed — Zed's Blog
The latest version of the open-source code editor introduces parallel agent support, allowing multiple AI agents to run simultaneously in a single window. A new Thread sidebar lets developers manage, monitor, and organize agent threads by project, control which folders and repos each agent can access, and mix different agents per thread. The update also ships a redesigned default layout that puts agent threads front and center. The feature is framed around 'agentic engineering' — combining human craftsmanship with AI tooling rather than fully delegating to AI.
62
3
10
Video
Continuous Delivery·5w
The Junior Developer CRISIS: How to Build a Team When AI Does the Entry-Level Work
A 30-year software engineering veteran argues that comparing LLMs/AI agents to junior developers is fundamentally wrong and does a disservice to both. Junior developers are curious, eager to learn, retain knowledge, and grow — they are humans at the 'conscious incompetence' stage. AI agents, by contrast, are transactional, stateless, lack memory across sessions, have no accountability, and don't care about your codebase or users. The author coins the analogy of 'Colin the contractor' — brilliant for narrow, well-defined tasks but unreliable and mercenary. Practical advice includes: give AI small, clearly articulated steps with frequent validation; give junior devs breakable toys, pair programming, and actionable feedback. The author warns that people equating the two either treat junior devs as robots or want to justify replacing them with AI — both problematic. The post ends with a tip to de-anthropomorphize AI interactions by configuring it to respond like a text-based adventure game.
60
12
11
Video
Fireship·5w
Cursor ditches VS Code, but not everyone is happy...
Cursor 3.0 marks a major shift from its VS Code fork origins, now rewritten from scratch in Rust and TypeScript with a focus on orchestrating swarms of AI agents across multiple repos, machines, and cloud environments simultaneously. The release also introduced Composer 2, an in-house coding model that sparked controversy after it was revealed to be based on Moonshot's Kimi K2 model — a fact Cursor initially obscured, later apologizing for the lack of transparency. The new interface de-emphasizes manual coding in favor of agent management, featuring parallel agent monitoring, built-in browser, design mode, and remote SSH support. Not everyone is enthusiastic about this direction, with some critics comparing it too closely to OpenAI Codex.
58
1
12
Article
XDA Developers·3w
I’d do these 5 things differently if I started self-hosting LLMs today
Lessons learned from months of self-hosting LLMs distilled into five practical changes: adopting Docker-only deployment for stability, documenting every configuration detail from the start, building agent-first infrastructure with tools like AgenticSeek and n8n instead of just chat interfaces, avoiding model hoarding by keeping only a few reliable models, and focusing on workflow integration so the LLM is embedded in daily work rather than a separate destination.
54
4
13
Article
A Java geek·5w
A GitHub agentic workflow
GitHub agentic workflows combine standard GitHub Actions with an AI agent (powered by Copilot) to handle semi-structured or unstructured data tasks. The author describes a real use case: automating the parsing of product release notes to generate upgrade analysis config files — something impossible with deterministic regex-based automation. Key steps covered include initializing workflows via the `gh aw` CLI extension, writing workflows in Markdown and compiling them to YAML, and using a fine-grained `GITHUB_COPILOT_TOKEN`. Practical pitfalls are shared: forgetting to compile Markdown to YAML before pushing, Windows/Linux line-ending issues requiring a `.gitattributes` fix, security concerns around auto-compiling workflows, and the inability to use GitHub Marketplace actions inside agentic workflows. The system prompt used at runtime is also shared, highlighting security hardening and prompt injection defenses.
53
2
14
Article
Anmol Baranwal·2w
Generative UI explained without the hype
Generative UI is a spectrum of three patterns for how AI agents control UI: Controlled (agent picks from predefined components), Declarative/A2UI (agent selects from a schema-driven catalog), and Open-ended (agent generates raw HTML or controls external apps via MCP). Each pattern trades design control for flexibility. CopilotKit supports all three via the AG-UI protocol, which streams events between agent and frontend. The post demystifies the vague term and explains when each pattern is appropriate, with code examples for each approach.
53
15
15
Video
ByteMonk·5w
Claude's Internal Architecture Revealed | How AI Agents Actually Work
Anthropic accidentally shipped TypeScript source maps with Claude Code's CLI distribution, exposing the full internal source code. Engineers reverse-engineered the architecture and rebuilt it in Python, then Rust, within 24 hours. The core architecture consists of an agent loop, a tool registry with 20+ tools, a hooks middleware system for safety and observability, a memory compaction mechanism for long sessions, context loading via CLAUDE.md and skills files, and sub-agent spawning for parallel task delegation. Each component maps to familiar distributed systems patterns: the loop is a task queue worker, tools are a service interface, hooks are middleware, memory compaction is log rotation, and sub-agents are worker nodes.
47
16
Article
Product Hunt·6w
Baton: Orchestrate your AI coding agents
Baton is a desktop app for orchestrating multiple AI coding agents in parallel. Each agent runs in its own git-isolated workspace, with smart notification badges to flag which agents need attention. It supports Claude Code, Codex, OpenCode, and any terminal-based agent. Features include diff review, file browsing, codebase search, and a built-in MCP server that lets agents spawn new agents. Built by a developer who needed a single unified interface to manage multiple agents without constant window-switching.
44
2
17
Article
Collections·3w
DESIGN.md: an open standard for describing design systems to AI agents
Google's Stitch team has open-sourced DESIGN.md, a file format specification that enables AI agents to read and act on design systems. The format pairs structured design tokens (hex codes, font sizes) with human-readable rationale explaining the reasoning behind design decisions. The GitHub-published spec includes a token section, a work-in-progress components section for role-based references, and a CLI tool that validates files against the spec with WCAG contrast ratio checks. The goal is to provide a shared foundation so AI agents and tools can consistently generate and consume design systems without each platform inventing its own approach.
48
8
18
Article
Redpanda·4w
Openclaw is not for enterprise scale
Running AI coding agents like OpenClaw (a thinly veiled reference to Claude Code) in enterprise environments without proper security architecture is fundamentally unsafe. Sandboxing alone is insufficient because credentials are already inside the sandbox. A proper enterprise-grade agentic architecture requires four components: a gateway as a single choke point for all agent access with full observability and kill-switch capability, audit logs and full transcripts capturing reasoning chains and tool calls, a token vault that keeps credentials out-of-band so agents never directly hold secrets, and sandboxed compute with strictly limited network access routed through the gateway. Redpanda demonstrates this with their 'agentic gateway interface' (agi) CLI. The core principle: agents can't leak credentials they never possess.
43
1
19
Article
ByteByteGo·3w
The Security Architecture of GitHub Agentic Workflow
GitHub built a layered security architecture for AI agents running inside GitHub Actions, designed around the assumption that the agent is already compromised. The architecture has three independent layers: a substrate layer using Docker containers and kernel-level isolation, a configuration layer that compiles workflows with explicit permissions and keeps secrets physically unreachable from the agent, and a planning layer that stages outputs for deterministic vetting before they affect real state. Key mechanisms include a secretless agent container topology using proxies and gateways, a safe outputs pipeline that enforces allowlists, quantity limits, and content sanitization, and comprehensive logging at every trust boundary. The post also discusses trade-offs: strict-by-default sandboxing limits flexibility, prompt injection remains fundamentally unsolved, and the architecture is complex enough that it may not suit simpler use cases.
43
20
Article
Supabase·5w
AI Agents Know About Supabase. They Don't Always Use It Right.
Supabase has released Agent Skills, an open-source set of instructions that teach AI coding agents how to use Supabase correctly. The problem: agents already know about Supabase from training data, but they make critical mistakes — skipping RLS policies, hallucinating CLI commands, creating views without security_invoker=true, and ignoring up-to-date docs. The skill is a ~100-line SKILL.md file covering four areas: documentation access (teaching agents to fetch current docs via MCP, curl, or web search), security (inline checklist of RLS and auth gotchas), tooling (CLI --help discovery and MCP troubleshooting), and schema management (direct SQL edits during dev, then formalize migrations). Testing across Claude Code and Codex showed consistent improvement when using MCP + Skill vs. baseline or MCP alone, with task completion rates rising from 42-71% to 67-88%. Key insight: the bottleneck is context, not capability — agents knew how to implement security_invoker correctly when the skill was loaded, they just didn't know when to apply it.
41
3
21
Video
DevOps Toolbox·4w
CMUX: Too Much Hype?
A hands-on critical review of CMUX (Simax), a new Mac-only terminal multiplexer built on top of Ghostty, aimed at AI agent workflows. The reviewer explores its workspace management, embedded browser tabs, notification system, CLI API, and SSH features, comparing them extensively to tmux with plugins like Sesh and Session X. While acknowledging CMUX has unique features (embedded browser, agent notifications, scriptable API), the reviewer finds it too rough around the edges, Mac-only, and not compelling for experienced tmux or Ghostty power users. The conclusion is that CMUX may appeal to newer developers forced into the terminal by AI agents, but doesn't justify the hype for seasoned terminal users.
37
2
22
Video
The Serious CTO·4w
Code Review Is Broken - Here's What Elite Teams Do Instead
Traditional code review processes are fundamentally broken, especially in the age of AI-generated code. The 'LGTM syndrome' — rubber-stamp approvals — creates an illusion of safety rather than real quality. AI coding agents now generate code far faster than humans can meaningfully review it, with AI-generated code producing 1.7x more issues per PR. The solution involves several shifts: keeping PRs small and short-lived, designing architectures for modifiability, replacing the gatekeeper model with a mentoring model, using synchronous collaboration like mob programming, maintaining healthy senior-to-junior ratios (1:2 to 1:4), adopting inner sourcing to prevent knowledge silos, and treating automated testing as a first-class architectural requirement. The goal is building engineers who understand the system deeply enough that reviews become a formality, not a bottleneck.
36
3
23
Article
Addy Osmani·2w
Long-running Agents
Long-running AI agents that operate over hours, days, or weeks represent the next evolution beyond single-session chat-based agents. Three core engineering challenges must be solved: finite context windows, lack of persistent state, and unreliable self-verification. Addy Osmani surveys how Anthropic, Cursor, and Google have converged on similar architectural patterns — separating the model loop (brain) from execution sandboxes (hands) and durable session logs — while differing in surface area and productization. Practical patterns covered include the Ralph loop (a simple bash-based task loop), checkpoint-and-resume, human-in-the-loop delegation, memory-layered context, ambient processing, and fleet orchestration. Key takeaways: define done-conditions before the agent starts, separate evaluator from generator, invest in append-only session logs, and treat context resets as first-class operations. Real limitations remain around cost, security, alignment drift, and verification overhead.
78
3
24
Article
Callstack Blog·4w
Agent React DevTools: Debug React Apps with AI Agents
Callstack has released Agent React DevTools, a CLI that gives AI agents direct access to React DevTools internals — including the component tree, state, profiling data, renders, and performance hotspots. Unlike UI tree inspection alone, this enables AI agents to understand why an app behaves a certain way, not just what it looks like. It integrates as a skill for AI agents and supports React and React Native apps. Integration with third-party plugins via Rozenite is also available, with plans to unify both into a single CLI.
35
5
25
Video
Theo - t3․gg·4w
I think every company should open source their code.
A strong argument for why companies should open source their software, framed around the emerging 'building block economy.' The core thesis is that AI agents prefer open, modular, well-documented components over closed commercial software, and that the future of competitive advantage lies in letting customers fork and customize your product rather than building every feature yourself. Uses T3 Code's 1,500 forks and Mitchell Hashimoto's Ghosty/libghosty growth data as evidence. Also introduces the concept of a 'patch.md' file — a plain-English description of user customizations that enables AI-assisted merge conflict resolution when upstream updates break forks — as a path toward self-forking, self-healing software.
32
10

See all ai-agents archives