Best of ai-agents — March 2026

1
Article
Yegor's Blog·10w
Fast Software: More Programmers, Not Fewer
AI coding agents will transform software development similarly to how fast fashion disrupted tailoring. Rather than eliminating programmers, this shift will devalue software craftsmanship and make software disposable and cheap to produce. Large software companies like Oracle, Adobe, and Microsoft will lose their monopoly on complexity as small shops can rebuild entire platforms for a few thousand dollars. The result will be a surge in demand for 'AI operators' — people who direct AI agents to build custom software — creating more jobs than exist today, not fewer.
108
17
2
Article
daily.dev Changelog·8w
daily.dev skills are here
daily.dev launched Skills, a set of plug-and-play integrations that connect AI coding agents to daily.dev's real-time, community-vetted developer content. Three skills are available: daily-dev (personalized feed, bookmarks, search), daily-dev-ask (developer-focused web search grounded in upvoted articles), and daily-dev-agentic (continuous self-improvement via fresh article ingestion). Skills work with Claude Code, Cursor, Codex, and OpenClaw, and require a Plus subscription. Setup takes about 30 seconds via the API settings page.
110
24
3
Article
Storybook·7w
Storybook MCP for React
Storybook MCP is a new Model Context Protocol server for React that gives AI coding agents intelligence about your existing component library. It provides agents with component metadata (stories, API, docs) to reuse existing components instead of generating new patterns, embeds live story previews directly in chat UIs, and enables agents to run component and accessibility tests autonomously — fixing issues or flagging them for human review. Available now in Storybook 10.3 for React, with support for other frameworks coming later. Teams can publish the MCP server remotely via Chromatic to share component context without running Storybook locally.
104
4
4
Article
Claude·10w
Improving skill-creator: Test, measure, and refine Agent Skills
Anthropic has enhanced skill-creator, a tool for building Agent Skills in Claude, with testing and evaluation capabilities. Authors can now write evals to verify skill behavior, run benchmarks tracking pass rate, time, and token usage, and use multi-agent support to run evals in parallel without context bleed. A comparator agent enables A/B testing between skill versions. The update also adds description tuning to improve skill triggering accuracy, reducing false positives and negatives. Two skill types are distinguished: capability uplift skills (teaching Claude new behaviors) and encoded preference skills (sequencing existing capabilities per team workflows), each benefiting from evals differently. The framework is available on Claude.ai, Cowork, and as a Claude Code plugin.
73
3
5
Article
daily.dev·7w
We built an org-wide AI agent in 4 days. Here's what broke in the weeks after.
daily.dev built 'Smith', a 29K-line TypeScript AI agent integrated into their Slack workspace in just 4 days using Codex. The post covers the production incidents and security challenges that followed: credential leaks in a shared runtime requiring a growing command sanitizer, GitHub token bleeding between user sessions, a Node.js event-loop hang that systemd couldn't detect (fixed with a watchdog + health checks), memory exhaustion from a power user's long conversations (fixed with cgroup limits), and a progressive tool disclosure system to manage 60+ tools. Smith self-authors its own reusable skills via a git-backed 'brain' repo and now runs autonomous nightly tasks like spam sweeps and A/B experiment audits. Known remaining issues include an unaudited skill brain, incomplete command sanitizer, and an unsolved crash pattern from one heavy user.
97
21
6
Article
ByteByteGo·8w
How Stripe’s Minions Ship 1,300 PRs a Week
Stripe runs over 1,300 fully automated pull requests per week using internal coding agents called Minions. These unattended agents work without human supervision, spinning up isolated cloud machines in under ten seconds, reading documentation, writing code, running linters, and submitting PRs ready for review. The system works because of four foundational layers: isolated devbox environments built for human engineers long before LLMs existed, hybrid 'blueprint' orchestration that mixes deterministic steps with agentic loops, curated context delivery via scoped rule files and a centralized MCP tool server called Toolshed, and fast feedback loops capped at two CI rounds to avoid diminishing returns. The key insight is that strong developer infrastructure—test suites, isolated environments, fast feedback—is the prerequisite for effective coding agents, not model selection.
64
2
7
Article
daily.dev·7w
How we built a Linear coding agent: the hard parts
daily.dev built Huginn, a coding agent integrated into Linear that automates the full workflow from ticket to PR. The post covers the hard engineering problems encountered: wrapping Claude Code and Codex as child processes with their undocumented streaming formats, building a three-tier fallback parser for structured LLM output, debugging session continuity failures caused by working directory changes, and using Linear labels as a crash-resilient state machine. The team also describes their 'Digital Twin Universe' (DTU) testing approach — in-memory replicas of Linear, GitHub, and KMS running in Docker containers — which made a 99% AI-generated codebase viable. Known limitations include ongoing output parsing fragility, rough BYOK credential handling, and poor fit for tight iterative or architecturally complex tasks.
65
6
8
Article
Redpanda·8w
Introducing Redpanda AI SDK for Go
Redpanda has open-sourced an AI SDK for Go designed for production use. The SDK addresses gaps in existing Go AI tooling by providing provider portability across OpenAI, Anthropic, Google Gemini, and AWS Bedrock, idiomatic streaming, composable middleware with layered interceptors, an Agent-to-Agent (A2A) adapter, a flexible tool system with MCP support, and a simulated LLM framework for deterministic testing. It powers Redpanda's own Agentic Data Plane and is available at github.com/redpanda-data/ai-sdk-go.
49
1
9
Article
The New Stack·9w
Andrej Karpathy’s 630-line Python script ran 50 experiments overnight without any human input
Andrej Karpathy released AutoResearch, a 630-line Python script that autonomously ran 50 ML experiments overnight on a single GPU without human input. The core design rests on three primitives: a single editable asset (the training script), a scalar metric (validation bits per byte), and a time-boxed evaluation cycle. A key insight is that a Markdown file called program.md serves as the human-agent interface, encoding search strategy, constraints, and stopping criteria in structured prose. This pattern generalizes beyond ML training to database query optimization, support ticket routing, and RAG pipeline tuning. The human role shifts from running experiments to writing experimental protocols, with the quality of the program.md document becoming the binding constraint on autonomous loop quality. Harrison Chase of LangChain has already adapted the pattern for agent optimization.
45
10
Article
Chrome Developers·9w
When to use WebMCP and MCP
WebMCP and MCP serve complementary roles in building agentic web experiences. MCP is a universal backend protocol connecting AI agents to external systems, data sources, and workflows across any platform. WebMCP is a proposed browser standard that exposes frontend tools to browser-based agents, giving them structured, reliable access to live website UI, DOM, session data, and cookies. Key differences: MCP is persistent and platform-agnostic; WebMCP is ephemeral and tab-bound. The recommended approach is to use MCP for core business logic and background tasks, and WebMCP for contextual in-browser interactions when a user is actively on your site.
43
3
11
Article
freeCodeCamp·10w
There are 2 kinds of devs. One of them is screwed. Justin Searls interview [Podcast #210]
Justin Searls, a software engineer who cofounded an agency 15 years ago and retired at 38, discusses how AI agents are reshaping software development. Key themes include the shift from team-based to individual developer work, the importance of verifiability in AI-assisted codebases, and how newer developers can leverage emerging tools to compete with experienced engineers. The podcast also links to community resources including Kubernetes, Notion, Python, and AI agent courses.
43
2
12
Article
Collections·9w
JetBrains Air is now in public preview: an IDE built around AI agents rather than the editor
JetBrains has launched two AI-focused developer tools. Air is an agentic development environment (public preview, macOS only) built on the abandoned Fleet IDE codebase, designed to orchestrate multiple AI agents — including OpenAI Codex, Claude, Gemini, and JetBrains' own Junie — concurrently in isolated sessions. It uses the open Agent Client Protocol (ACP) for vendor-neutral agent communication and bundles terminal, Git, code navigation, and preview in one workspace. Junie CLI is a standalone, LLM-agnostic coding agent for terminals, IDEs, and CI/CD pipelines, supporting models from OpenAI, Anthropic, Google, and Grok via bring-your-own-key. It emphasizes codebase structural understanding to avoid what JetBrains calls 'Shadow Tech Debt,' and includes next-task prediction, MCP support, and one-click migration from Claude Code and Codex. Pricing starts at $10/month. JetBrains positions both tools as neutral infrastructure beneath existing agents rather than direct competitors.
41
6
13
Article
Product Hunt·7w
Maestri: An infinite canvas where coding agents work in concert
Maestri is a native macOS app built by a solo developer that provides an infinite canvas for managing multiple AI coding agents simultaneously. Each terminal session is a visual node that can be freely positioned alongside notes and sketches. The standout feature is agent-to-agent communication: drag a line between two terminals and the agents collaborate directly via PTY orchestration — no APIs or middleware required. Claude Code can talk to Codex, Gemini can delegate to OpenCode, etc. An on-device AI companion called Ombro (powered by Apple Intelligence) monitors all activity and summarizes what happened while you were away. Built in Swift with a custom canvas engine, it requires no account, collects no telemetry, and is priced at $18 lifetime for Pro with one free workspace.
34
14
Article
databricks·9w
Introducing Kasal
Kasal is a new visual, no-code platform built on Databricks for designing, deploying, and monitoring agentic AI workflows. It uses a drag-and-drop canvas or conversational assistant to let both technical and non-technical users build single and multi-agent systems without writing orchestration code. Under the hood it leverages CrewAI for agent orchestration and integrates with MLflow for tracing, Databricks Apps for deployment, and supports MCP servers, Genie, and custom APIs. Workflows can be exported as code for further customization, and a catalog enables reuse across teams.
27
1
15
Article
Supabase·8w
Supabase joins the Stripe Projects developer preview
Supabase has joined the Stripe Projects developer preview as a co-design partner. Stripe Projects is a new Stripe CLI workflow that lets developers and AI agents provision real services with a single command. Running one command provisions a full Supabase project including Postgres, Auth, Storage, Edge Functions, and Realtime, with credentials automatically written to a local .env file. The integration is deterministic and repeatable, designed to work for both humans and AI agents that cannot interact with browser-based dashboards. Users retain full ownership of their Supabase account and data, and can rotate credentials at any time.
29
1
16
Article
Reid Burke·9w
Worktrunk — Git Worktree Manager for AI Agent Workflows
Worktrunk is a CLI tool that simplifies Git worktree management, specifically designed for running multiple AI coding agents in parallel. It wraps Git's native worktree feature with three core commands (switch, list, merge/remove) to eliminate repetitive path and branch typing. Key features include hooks for automating setup tasks, LLM-generated commit messages, an interactive worktree picker with live diff previews, build cache sharing between worktrees, and direct integration with Claude Code and Codex. Installation is available via Homebrew, Cargo, Winget, or AUR.
27
1
17
Article
Laravel News·10w
Polyscope Is an Ai-First Dev Environment for Orchestrating Agents
Beyond Code has launched Polyscope, a free macOS tool for orchestrating multiple AI agents simultaneously. Key features include copy-on-write clones for fast agent branching, a built-in preview browser for visual prompting, the ability to query multiple models collaboratively, and support for connecting workspaces to multiple repositories. It offers a free plan alongside paid options for solo developers and teams.
25
1
18
Article
SwirlAI·9w
Agent Skills: Progressive Disclosure as a System Design Pattern
Agent Skills is an open standard released by Anthropic in December 2025 that uses a simple SKILL.md file format to give AI agents modular, progressively loaded capabilities. The format applies the progressive disclosure design pattern to agent context management: at startup only skill names and descriptions are loaded (~80 tokens each), full instructions are activated when relevant, and supporting scripts/docs are pulled in only during execution. This three-tier architecture solves the context window degradation problem ('lost-in-the-middle') while making agent behavior configurable by non-technical users. OpenAI, Google, GitHub Copilot, and Cursor all adopted the standard within weeks of its release. The pattern generalizes beyond coding agents to any system needing broad capability with focused execution, and AI Engineers building non-coding agents must implement the same discovery-activation-execution pipeline themselves.
24
2
19
Article
SkyPilot·8w
Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster
Karpathy's autoresearch project lets a coding agent autonomously improve a neural network training script by running experiments in a loop. This post scales that setup by giving Claude Code access to 16 GPUs (H100s and H200s) on a Kubernetes cluster via SkyPilot. Over 8 hours, the agent ran ~910 experiments in parallel waves of 10-13, achieving a 9x throughput increase over single-GPU sequential search. Key findings: parallelism enabled factorial grid search instead of greedy hill-climbing, allowing the agent to discover that scaling model width (aspect ratio 96) outperformed all hyperparameter tuning combined. The agent also autonomously developed a two-tier hardware strategy — screening ideas on cheaper H100s and validating winners on H200s — without being prompted. Total cost was under $300 in GPU compute plus ~$9 in Claude API fees. The full setup is available as an open-source example in the SkyPilot repo.
26
2
20
Article
Tech Lead Digest·11w
How does Docusign have 7,000 employees?
A deep dive into why Docusign employs 7,000 people despite appearing to be a simple e-signature tool, used as a lens to analyze the broader impact of AI coding agents on B2B SaaS. The piece explains Docusign's operational complexity — 35 billion signatures annually across 180 countries, multi-cloud infrastructure, legal compliance, and 1.8 million paying customers — justifying its headcount. It then explores how AI coding agents will reshape SaaS business models: commoditizing software creation, threatening per-seat pricing, shifting value to data ownership and mission-critical integrations, and potentially compressing SaaS profit pools. Perspectives from investors, analysts, and founders suggest software isn't dying but will face structural changes in the 2030s, with defensible moats coming from proprietary data, regulatory complexity, and deep customer integrations rather than the software layer itself.
24
4
21
Video
DevOps Toolbox·10w
Stop Using Git Worktrees. Do THIS Instead.
Git worktrees allow parallel development across multiple branches without stashing, but their CLI UX is notoriously painful. Work Trunk is a new CLI tool that wraps Git worktrees with a much smoother interface: it handles branch creation, switching, merging (including auto-staging, AI-generated commit messages via Claude, stash management, and cleanup) in single commands. Key features include hooks for automating tasks on worktree events (e.g., running npm install, renaming tmux windows), fuzzy-finding via jq/fzf integration, PR resolution via GitHub CLI, copying gitignored files across worktrees, and per-step merge control. The tool is positioned as especially useful for AI agent-based parallel development workflows.
22
22
Article
ASP.NET Blog·7w
Generative AI for Beginners .NET: Version 2 on .NET 10
Version 2 of the free open-source course 'Generative AI for Beginners .NET' has been released, completely rebuilt on .NET 10. The curriculum is restructured into five focused lessons covering generative AI fundamentals, practical techniques (chat completions, prompt engineering, RAG, function calling), AI application patterns, multi-agent systems using the Microsoft Agent Framework, and responsible AI. The primary AI abstraction has shifted from Semantic Kernel to Microsoft.Extensions.AI (MEAI), which aligns with .NET 10 patterns like dependency injection. RAG samples have been rewritten using native SDKs, 11 legacy Semantic Kernel samples moved to deprecated, and all eight language translations updated.
22
23
Article
Tech World With Milan·7w
Agentic code workflows with Nick Tune
Nick Tune, Senior Staff Software Engineer at PayFit, shares his advanced agentic coding workflow built around deterministic, state-machine-driven processes. Key practices include modeling the dev workflow as a typed state machine with unit-tested transitions, using PRD expert agents for structured planning before writing any code, enforcing architecture rules via lint and dependency-cruiser checks rather than relying on AI compliance, running layered code reviews with CodeRabbit and local review agents, and driving Claude through a strict TDD red-green cycle with verified pre/post-conditions. The approach prioritizes determinism over prompt-based trust, using pre-commit hooks and banned commands to prevent AI from bypassing guardrails.
33
24
Article
ByteByteGo·8w
EP207: Top 12 GitHub AI Repositories
A curated list of 12 popular GitHub AI repositories ranked by stars, including Ollama, LangChain, Dify, Open WebUI, DeepSeek-V3, Claude Code, CrewAI, and others. Also covers where different test types fit in a testing strategy (unit, integration, E2E), how SSO works step by step using SAML/OIDC, how LLMs orchestrate multi-agent deep research workflows, and six common password attack techniques.
31
1
25
Article
Samuel Adekunle·8w
Stitch + Antigravity + Flutter: Build Apps with AI Agents in 2026
A walkthrough of an AI-assisted Flutter app development workflow using Google's Stitch (AI design agent) and Antigravity (agentic IDE). Stitch generates UI designs from text prompts, which are then exported via MCP connectors to Antigravity, where an AI agent writes the full Flutter/Dart codebase following Clean Architecture with Riverpod. The demo builds a Daily Habit Tracker app from prompt to running emulator in about 10-12 minutes. Includes setup steps, example prompts, best practices, and limitations of the agentic approach.
22
6

See all ai-agents archives