Best of ai-agents — February 2026

1
Article
Justin Searls·12w
Brace for the Fuckening
A developer's blunt take on the economic consequences if AI investments actually pay off. The author argues that tech CEOs are being dishonest about job creation, that white-collar workers in accounting, law, and consulting face severe displacement, and that programmers will remain busy but spread thinner across more projects. The post warns of 'The Fuckening'—a macroeconomically significant collapse in high-paying office jobs—and offers individualized survival advice: quantify your direct contribution to revenue, position yourself close to the money, and continuously reassess your value as AI tools improve.
81
13
2
Article
Addy Osmani·11w
Stop Using /init for AGENTS.md
Auto-generated AGENTS.md files (produced via /init) hurt AI coding agent performance and inflate costs by 20%+ because they duplicate information agents can already discover by reading the codebase. Two 2026 research papers show LLM-generated context files reduce task success while increasing cost, whereas human-written files help only when they contain non-discoverable information like tooling gotchas, non-obvious conventions, and hidden landmines. The right mental model is to treat AGENTS.md as a minimal, living list of codebase friction points that can't be inferred—not a comprehensive onboarding document. Every discoverable line is noise that competes with the actual task via context dilution. A better architecture involves a routing layer with dynamically loaded, task-specific context rather than a monolithic static file, though tooling support for this is still lacking.
75
5
3
Article
Supabase·14w
BKND joins Supabase
Dennis Senn, creator of BKND, is joining Supabase to develop a Lite offering tailored for agentic workloads. The focus is on building lightweight backend infrastructure with features like trimmed-down sandboxes, specialized database architectures for AI agents, and simpler, more affordable databases. BKND will remain open source while the team explores the best approach to creating a lightweight Supabase experience in a separate repository.
70
3
4
Article
neo4j·12w
I Built a Tiny AI Agent just for fun
A developer built a minimal AI agent using Neo4j Aura Agents to answer a single question: whether a startup idea is crowded or has unmet opportunities. The graph had just six nodes and three relationship types. Key decisions included skipping embeddings in favor of explicit graph traversal, removing Text2Cypher to prevent hallucination, and using two deterministic Cypher-based tools with strict parameter binding. The result was a fully traceable reasoning system where every output maps directly to graph relationships, demonstrating that for closed domains with fixed schemas, explicit traversal can replace probabilistic reasoning.
67
3
5
Article
Product Hunt·14w
spacecake: the best interface for Claude Code
spacecake is an open-source desktop application that provides a visual interface for Claude Code agents. It combines a Notion-style markdown editor with an integrated Ghostty terminal, allowing developers to run AI agents while editing markdown content. The app includes a real-time status bar showing context window usage and costs, plus a tasks panel for monitoring agent activities and upcoming actions.
64
6
Video
ByteMonk·11w
OpenClaw: The Most Dangerous AI Project on GitHub?
OpenClaw is a self-hosted AI agent with 200,000+ GitHub stars that connects to messaging apps, file systems, and terminals to act autonomously. Its architecture uses four layers: a WebSocket gateway, an LLM reasoning layer, a markdown-based memory system with write-ahead logging, and a skills execution layer. However, serious security issues have emerged: a WebSocket origin validation vulnerability allowed one-click full compromise, 20% of its plugin marketplace (Claw Hub) was found to contain malware, and over 30,000 instances are exposed on the public internet with no authentication. Six additional CVEs were recently disclosed. Safe usage recommendations include running it in Docker or rootless Podman with two-layer container isolation, binding the gateway to localhost only, and vetting every plugin before installation.
53
3
7
Article
Hacker News·11w
vercel-labs/just-bash: Bash for Agents
just-bash is a TypeScript library from Vercel Labs that simulates a bash environment with an in-memory virtual filesystem, designed for AI agents needing a secure sandboxed shell. It supports a wide range of Unix commands (file ops, text processing, jq, sqlite3, Python via Pyodide), multiple filesystem backends (InMemoryFs, OverlayFs, ReadWriteFs, MountableFs), optional network access with URL allow-lists, execution protection against infinite loops, AST transform plugins, and a Vercel AI SDK integration via the companion bash-tool package. A CLI binary and interactive shell are also included.
48
2
8
Article
InfoWorld·14w
AI is not coming for your developer job
Agentic AI excels at deterministic coding tasks like writing, refactoring, and validating code, but lacks the strategic context and human interpretation needed for real engineering work. AI operates within fixed parameters and cannot adapt to shifting business priorities, customer needs, or strategic realignments that arrive through fragmented human communication. The future lies not in replacing developers but in AI handling mechanical tasks while humans focus on interpretation, strategy, and building with intent. For AI to become a true collaborator, it must understand evolving context—not just what code does, but whether it still matters given current priorities.
47
8
9
Video
Matt Pocock·11w
Never Run claude /init
Running `claude /init` generates a CLAUDE.md file that bloats the agent's system prompt with auto-discovered codebase documentation. This wastes tokens on every request, distracts the agent with irrelevant context, and quickly goes out of date as code changes. Research confirms that unnecessary requirements in context files make tasks harder. Instead, agents should rely on their built-in explore phase to discover context just-in-time. The only content worth putting in CLAUDE.md is truly global, non-obvious setup information (e.g., 'you are on WSL on Windows') — keep it to a minimum and let the file system and source code serve as the real source of truth.
45
2
10
Article
Horde·12w
Here's how I use OpenClaw agents daily
A developer shares their personal daily workflow using OpenClaw AI agents across a wide range of tasks: coding and code review, email scanning and invoice organization, trend monitoring from X/HN/Reddit/ProductHunt every two hours, content idea generation for YouTube and social media, sponsorship proposal handling, marketing and product sharing on awesome repos, cold outreach, Discord management, workout/nutrition planning, follow-up reminders, daily project improvement suggestions, and self-updating agents. The setup also features smart model selection to optimize results at the lowest cost.
46
19
11
Article
Snowflake Community·14w
SKILLs MD for Analytics: How We Made Snowflake Intelligence Agents Reliable for Production
PDQ solved AI agent hallucinations in production analytics by encoding Standard Operating Procedures as SKILLs—version-controlled, agent-executable contracts that define inputs, logic, validation, and guardrails. Instead of scaling multiple specialized agents with long prompts, they built one agent with a library of SKILLs deployed via Git-Ops to Snowflake Dynamic Tables and indexed through Cortex Search. This approach eliminated inconsistent answers, improved quality through mandatory validation steps, and made agent reasoning auditable like code. The architecture separates SKILL discovery (lightweight semantic search) from execution (loading complete SOPs) to preserve context windows while ensuring deterministic analytical workflows.
41
1
12
Article
Ramp Engineering·12w
We fixed ~100 security issues in 6 days with 0 humans
Ramp's security engineering team built a multi-agent pipeline that autonomously found, validated, and patched nearly 100 security vulnerabilities in their backend codebase in under a week, with no human involvement until PR review. The system used specialized detector agents for specific vulnerability classes (e.g., IDOR), adversarial manager agents to filter false positives (rejecting 40% of initial findings), a validator agent that wrote integration tests to confirm real issues, and a fixer agent that applied patches using test-driven development. The approach uncovered novel high-severity issues missed by penetration testing, bug bounties, and 10+ commercial scanning tools. The entire setup required only a four-hour hackathon and one week of work by a single engineer.
36
3
13
Article
LangChain·12w
Agent Observability Powers Agent Evaluation
Agent observability differs fundamentally from traditional software observability because agents are non-deterministic — you can't predict behavior until runtime. This post explains why debugging agents means debugging reasoning rather than code, introduces three core observability primitives (runs, traces, threads), and shows how these primitives map directly to three levels of agent evaluation: single-step (unit tests for decisions), full-turn (end-to-end trajectory), and multi-turn (context persistence across sessions). Production traces serve triple duty: manual debugging, building offline evaluation datasets from real failures, and powering continuous online evaluation. The key insight is that observability and evaluation are inseparable for agents — traces are the only source of truth for what an agent actually did.
34
1
14
Video
bycloud·13w
LLM’s Billion Dollar Problem
Token consumption in LLMs has exploded with thinking models and AI agents, creating scalability challenges. Standard attention mechanisms scale quadratically with context length, making long contexts prohibitively expensive. Three approaches attempt to solve this: sparse attention (restricts which tokens interact), linear attention (accumulates information in shared memory), and compressed attention (compresses tokens before comparison). While sparse and compressed attention help, only linear attention can theoretically scale past 1M context windows. Recent developments show hybrid approaches combining linear attention with standard or compressed attention achieving promising results, with Google's Gemini 3 Flash demonstrating breakthrough performance at 1M context length.
35
15
Article
MIT Technology Review·14w
Moltbook was peak AI theater
Moltbook, a viral social network where AI agents interact, reveals more about current AI hype than future capabilities. While millions of LLM-powered agents posted content and formed communities, experts argue the platform demonstrates pattern-matching rather than true intelligence or autonomy. Humans remain involved at every step, from setup to prompting, making it more entertainment than emergent AI behavior. The experiment exposed security risks as agents with access to private data interact with unvetted content, but ultimately shows how far we are from truly autonomous AI systems.
28
15
16
Article
Spotify Labs·12w
Our Multi-Agent Architecture for Smarter Advertising
Spotify Engineering shares how they built a multi-agent AI system called Ads AI to solve fragmented media planning workflows across their advertising channels. Instead of duplicating decision logic per channel, they decomposed the media planning problem into specialized agents (RouterAgent, GoalResolverAgent, AudienceResolverAgent, BudgetAgent, ScheduleAgent, MediaPlannerAgent) that run in parallel using Google's Agent Development Kit (ADK) and Vertex AI (Gemini 2.5 Pro). The system takes natural language campaign requirements and generates optimized media plans in 5–10 seconds, down from 15–30 minutes of manual work. Key lessons include treating prompt engineering as software engineering, drawing careful agent boundaries, and grounding LLM outputs with real data via function-calling tools.
28
17
Article
The Miners·11w
The Double Standard Is Killing AI Adoption in Your Team
Developers apply a double standard when reviewing AI-generated code, demanding perfection from agents while routinely approving untested, poorly structured human-written code. Drawing on Linus Torvalds' 1992 defense of Linux against Tanenbaum's microkernel critique and Richard Gabriel's 'Worse is Better' essay, the argument is that shipping functional, tested code has always mattered more than theoretical elegance. AI-generated code that compiles, runs, and includes tests deserves the same pragmatic review standard applied to human code — not a higher bar.
24
11
18
Article
Max Woolf's Blog·11w
An AI agent coding skeptic tries AI agent coding, in excessive detail
A self-described AI agent skeptic documents their journey from dismissing agentic coding to becoming a cautious convert after working with Claude Opus 4.5 and OpenAI Codex. The author shares detailed real-world experiments: building a YouTube scraper, a FastAPI webapp, Rust packages with Python bindings (icon rendering, word clouds, a terminal MIDI DAW, a physics simulator), and ultimately developing high-performance Rust implementations of ML algorithms (UMAP, HDBSCAN, GBDT) that outperform existing C/C++ libraries by 2-100x. Key insights include the importance of a well-crafted AGENTS.md file for controlling agent behavior, chaining Codex and Opus for iterative optimization, and the value of having approximate domain knowledge to audit agent output. The author remains measured—acknowledging real productivity gains while resisting hype—and open-sources all projects.
23
19
Article
LangChain·12w
How we built Agent Builder’s memory system
LangSmith Agent Builder uses a filesystem-based memory system to give task-specific agents persistent, evolving knowledge across sessions. Memory is stored as files in Postgres but exposed to the agent as a virtual filesystem, mapping to COALA's memory taxonomy: procedural (AGENTS.md, tools.json), semantic (skill files, knowledge files), with episodic memory planned. Agents update their own memory in-the-hot-path as they work, with human-in-the-loop approval to guard against prompt injection. Key learnings include: prompting is the hardest part, agents need help compacting generalizations, and explicit memory commands are still useful. Future work includes background memory processes, a /remember command, semantic search over memory, and user/org-level memory scopes.
23
20
Article
David Heinemeier Hansson·14w
Clankers with claws
OpenClaw enables AI models to operate persistently with their own machine, memory, and execution environment beyond simple prompt-response cycles. Testing shows modern AI agents can navigate human-centric digital interfaces without specialized tools, APIs, or MCPs—successfully signing up for services, managing email, and completing multi-step tasks autonomously. This demonstrates that AI agents may soon interact with software through standard user interfaces rather than requiring dedicated machine-readable protocols.
22
21
Article
Hugging Face·13w
Custom Kernels for All from Codex and Claude
HuggingFace built an agent skill that teaches AI coding agents (Claude, Codex) to write production-ready CUDA kernels with PyTorch bindings. The skill packages domain expertise about GPU architectures, memory patterns, and library integration into ~550 tokens of structured guidance. Testing on LTX-Video (diffusers) and Qwen3-8B (transformers) showed the agent-generated RMSNorm kernels achieved 1.88-1.94x speedup over PyTorch baselines, with 6% end-to-end improvement in video generation. The skill integrates with HuggingFace's Kernel Hub for distribution, enabling developers to generate, benchmark, and publish optimized kernels without deep CUDA expertise.
21
2
22
Article
Viget·12w
Pointless explorations of Obsidian & Claude Code
A three-person team at Viget used their internal hackathon to explore integrating Claude Code with Obsidian vaults for knowledge work automation. They built an Obsidian plugin that auto-summarizes web clips (including YouTube transcripts), enables multi-file LLM synthesis, and adds multi-select to the file explorer. The stack used TypeScript, Vercel AI SDK, and youtubei.js. A key takeaway was the 'compound engineering' workflow: after each session, updating CLAUDE.md with lessons learned to prevent repeated mistakes. Claude Code took them from a three-bullet plan to a working MVP in ~30 minutes, though they note it requires tighter guardrails for high-stakes projects.
19
2
23
Article
mlflow·14w
Introducing DeepEval, RAGAS, and Phoenix Judges in MLflow
MLflow 3.9.0 integrates over 50 evaluation metrics from DeepEval, RAGAS, and Arize Phoenix frameworks into a unified API. This integration enables developers to evaluate LLM agents and RAG systems using multiple judge frameworks simultaneously, compare results side-by-side in MLflow UI, and access specialized metrics for conversational agents, retrieval quality, hallucination detection, and safety. The unified interface eliminates the need for custom wrappers and provides visualization, filtering, and iteration tools for improving agent quality before production deployment.
19
24
Article
Laravel·11w
Your AI Agent Can Now Deploy to Laravel Cloud (and Write This Blog)
Laravel Cloud has opened early access to its REST API, enabling AI agents to manage cloud infrastructure through conversation. Florian Beer, an Infrastructure Engineer at Laravel, built a 400-line bash CLI wrapper called the `laravel-cloud` skill that covers all 19 resource categories of the API — deployments, databases, caches, domains, scaling, and more. The skill is installable via `clawhub install laravel-cloud` and works with any OpenClaw-compatible AI agent setup. The post (itself written by an AI bot) explains the design rationale for using bash (zero setup friction), demonstrates example agent interactions, and argues that clean, well-documented APIs like Laravel Cloud's are well-positioned for the emerging era of action-taking AI agents.
11
1
25
Article
freeCodeCamp·11w
Learn Python and Build Autonomous Agents
A 6-hour freeCodeCamp course covering Python fundamentals through AI agent development. The curriculum spans four modules: Python essentials (variables, loops, functions), data science foundations (NumPy, Pandas, SQLite), API development (REST APIs, FastAPI), and AI/LLM integration including ChatGPT, Gemini, and open-source HuggingFace models for building autonomous agents.
10

See all ai-agents archives