Best of Testing — February 2026

1
Article
Addy Osmani·16w
Agentic Engineering
Agentic engineering is a disciplined approach to AI-assisted software development that distinguishes itself from "vibe coding" through human oversight and engineering rigor. While vibe coding means accepting AI output without review (useful for prototypes and MVPs), agentic engineering involves treating AI agents as tools that handle implementation under careful human direction. The workflow requires writing specs before prompting, reviewing every diff, running comprehensive test suites, and maintaining ownership of the codebase. This approach disproportionately benefits senior engineers with strong fundamentals, as it trades typing time for review time and demands architectural thinking over raw code generation. The rise of AI coding raises rather than lowers the bar for software engineering craft.
377
11
2
Article
Bitfield Consulting·14w
Go the right way: the Zen of Go coding — Bitfield Consulting
Ten principles for writing high-quality Go code: structure code as reusable packages, write comprehensive tests, prioritize readability, design safe-by-default APIs, wrap errors properly, avoid mutable global state, use structured concurrency sparingly, decouple from environment specifics, handle errors gracefully, and log only actionable information. Emphasizes making code work first, then refactoring for quality while keeping maintenance in mind.
130
3
3
Article
Tech World With Milan·12w
What I learned from the book Software Engineering at Google
A detailed breakdown of key lessons from the book 'Software Engineering at Google', covering the distinction between programming and engineering, Hyrum's Law, the Beyoncé Rule, shift-left testing, why mocking frameworks are discouraged in favor of fakes, code review best practices, small frequent releases, dependency management, the GSM productivity framework, and engineering culture. The post also includes honest admissions from the authors about what doesn't work even at Google, and closes with practical takeaways applicable to teams of any size.
130
4
Article
System Design Newsletter·16w
I struggled to code with AI until I learned this workflow
AI coding assistants work best through an iterative workflow rather than one-shot prompts. The key is providing comprehensive context (project background, constraints, relevant code), requesting a plan before implementation, generating code in small steps with defined roles (planner, implementer, tester, explainer), reviewing output with AI-assisted tools, writing tests immediately, and debugging systematically. Common pitfalls include context drift in long conversations, API version mismatches, and over-reliance on AI without understanding the output. The workflow emphasizes treating AI like a new teammate who needs explicit briefing, keeping changes small and reviewable, and maintaining human oversight throughout the process.
89
3
5
Article
thoughbot·15w
Claude Code: Production ready code in a two-week sprint
Thoughtbot demonstrates how to use Claude Code to build production-quality Rails applications through disciplined practices. The approach emphasizes small, controlled tasks, comprehensive test coverage, frequent commits with human review, and maintaining context through documentation. During a two-week sprint for TellaDraft, they integrated multiple AI services (ElevenLabs, WhisperAI, ChatGPT) while ensuring code quality through constant validation, proper testing patterns, and avoiding the pitfalls of "vibe coding" where AI generates unreviewed code.
72
17
6
Article
Bun·15w
Bun v1.3.9
Bun v1.3.9 introduces parallel and sequential script execution with `--parallel` and `--sequential` flags, supporting workspace filtering and Foreman-style output. Testing improvements include `Symbol.dispose` support for automatic mock cleanup. Performance enhancements include SIMD-accelerated RegExp matching, faster string operations (trim, startsWith), optimized Markdown rendering, and ESM bytecode compilation support. HTTP/2 connection upgrades via net.Server now work correctly. Bug fixes address ARM64 crashes, Windows filesystem operations, WebSocket stability, and HTTP proxy keep-alive issues.
67
7
Article
Programming Digest·15w
The Phoenix Architecture
The "deletion test" is a thought experiment: imagine deleting your entire codebase and regenerating it from scratch. If that's terrifying, it reveals that critical knowledge lives only in the code itself, not in specifications, tests, or contracts. As code generation becomes cheaper through AI, the bottleneck shifts from production to validation. Systems should be built around durable oracles (property-based tests, invariants, contracts) that can mechanically verify correctness without referencing old implementations. When you have strong evaluation mechanisms, code becomes disposable and regeneration becomes safe.
61
4
8
Article
Baeldung·16w
Why We Should Not Mock Collections With Mockito
Mocking Java collections like List, Set, or Map with Mockito is an anti-pattern that leads to brittle tests and unrealistic behavior. Collections are deterministic data structures, not external dependencies requiring isolation. On Java 21+, mocking collections may fail due to stricter JVM instrumentation rules. Instead of mocking, use real collection instances in tests to create clearer, more maintainable tests that focus on observable behavior rather than implementation details. This approach exposes design issues and encourages better separation of concerns.
54
2
9
Article
IT Revolution·16w
“No Vibe Coding While I’m On Call”: What Happens When AI Writes Your Production Code
AI code generation without proper guardrails leads to production incidents. Through a fictional narrative of a company experiencing repeated outages from AI-generated code, the article illustrates four critical failure patterns: AI optimizing code without understanding system context, generating tests that pass but don't validate requirements, documenting features that don't exist, and eroding architectural resilience through incremental changes. The solution involves breaking AI tasks into small verifiable chunks, using AI to critique its own work, verifying documentation against actual code, establishing architectural reviews, and building observability from day one.
47
10
Article
Laravel News·14w
Nimbus: An In-Browser API Testing Playground for Laravel
Nimbus is a Laravel package that provides an in-browser API testing playground for development. It automatically discovers routes and validation rules from FormRequest classes, Spatie Data objects, or OpenAPI specs. Key features include transaction mode for testing destructive operations without data loss, user impersonation for authorization testing, shareable request configurations, automatic test data generation, and dd() output handling. Unlike Swagger or Scribe, it's not for customer-facing documentation but rather a developer tool to speed up API iteration.
43
1
11
Article
ploeh blog·13w
TDD as induction
Mark Seemann draws a metaphor between TDD and mathematical induction to explain why test-driven code tends to work across multiple environments in a non-linear fashion. Starting with an anecdote about locale-dependent test failures discovered when a UK developer joined a Danish team, he explores how tests make implicit assumptions about their execution context. He argues that once a test passes in one environment and then a second, it tends to pass in most subsequent environments — analogous to how mathematical induction works with base cases. The post emphasizes that tests must explicitly state all relevant assumptions, warns against Ambient Context as an anti-pattern, and notes that Haskell's deterministic APIs help avoid such implicit environment dependencies.
36
12
Article
LangChain·13w
Agent Observability Powers Agent Evaluation
Agent observability differs fundamentally from traditional software observability because agents are non-deterministic — you can't predict behavior until runtime. This post explains why debugging agents means debugging reasoning rather than code, introduces three core observability primitives (runs, traces, threads), and shows how these primitives map directly to three levels of agent evaluation: single-step (unit tests for decisions), full-turn (end-to-end trajectory), and multi-turn (context persistence across sessions). Production traces serve triple duty: manual debugging, building offline evaluation datasets from real failures, and powering continuous online evaluation. The key insight is that observability and evaluation are inseparable for agents — traces are the only source of truth for what an agent actually did.
34
1
13
Article
Developer's Journey·14w
When Testing Costs Money
A developer shares a real-world experience integrating a third-party API that lacked a testing environment and only provided limited credits. Facing the constraint of minimizing costly API calls during development, the strategy involved thoroughly reading documentation, testing requests with Bruno (an offline API client), building a full mock implementation first, and only switching to the real API after all edge cases were handled. The approach resulted in fewer than 20 real API calls to complete the integration successfully.
15
14
Article
Laravel News·15w
Laravel 12.51.0 Adds afterSending Callbacks, Validator whenFails, and MySQL Timeout
Laravel 12.51.0 introduces notification afterSending callbacks for post-send logic, fluent whenFails and whenPasses validator methods for non-HTTP contexts, and a MySQL-specific query timeout method. The release adds closure support in firstOrCreate and createOrFirst for lazy evaluation of expensive operations, a BatchCancelled event for monitoring batch job cancellations, and the ability to use Eloquent builders directly as subqueries in updates. Additional improvements include a withoutHeader response method, enhanced batch testing assertions, cache isolation for parallel tests, and numerous bug fixes across database operations, string helpers, queue middleware, and framework internals.
12

See all Testing archives