Best of PerformanceMarch 2026

  1. 1
    Article
    Avatar of sitepointSitePoint·6w

    React 19 Compiler: What Senior Developers Need to Know

    The React Compiler (formerly React Forget), released as an opt-in beta alongside React 19, automates memoization by performing static analysis at build time rather than relying on developer-placed useMemo, useCallback, and React.memo hints. It integrates via a Babel plugin and enforces the Rules of React, silently skipping impure components. Senior developers can drop manual memoization ceremony from pure components, but must now audit codebases for purity violations, understand compiler-aware debugging via React DevTools, and adopt a deliberate migration strategy using 'use no memo' directives. The new senior skillset shifts from dependency array mastery to architectural thinking, component purity discipline, and understanding server/client boundary decisions.

  2. 2
    Article
    Avatar of nuxt_sourceNuxt·5w

    Nuxt 4.4 · Nuxt Blog

    Nuxt 4.4 ships several developer experience and performance improvements. Key additions include createUseFetch/createUseAsyncData factories for custom composable instances with typed defaults, an upgrade to vue-router v5 (dropping unplugin-vue-router), typed layout props via definePageMeta, and a new useAnnouncer composable for accessibility announcements. Route generation migrates to the unrouting library using a trie structure, delivering up to 28x faster dev server updates. Smarter payload handling for cached/ISR routes reduces redundant SSR re-renders in serverless environments. Other highlights: useCookie refresh option for session expiration, useState reset-to-default behavior, improved import protection with traces and suggestions, view transition types support, build profiling via nuxt build --profile, and a 14,000x faster module ID parsing optimization.

  3. 3
    Article
    Avatar of lobstersLobsters·6w

    Rust zero-cost abstractions vs. SIMD

    A full-text search query on turbopuffer was taking 220ms instead of the expected ~50ms. Profiling revealed that over 60% of runtime was spent in a merge iterator, not in BM25 ranking. The root cause: Rust's zero-cost iterator abstraction, while compiling each individual call efficiently, prevented the compiler from vectorizing or unrolling across calls due to the recursive nature of `next()`. The fix was a classic database technique — batched iterators — where `next_batch()` fills an array of 512 KV pairs at once, giving the compiler a tight inner loop it can auto-vectorize with SIMD. The result: the benchmark dropped from 6.5ms to 110μs (60× faster), and the production query latency fell from 220ms to 47ms. The key lesson: 'zero-cost' means the abstraction compiles away per call, not that it has no effect on the compiler's ability to optimize across calls.

  4. 4
    Article
    Avatar of hnHacker News·6w

    Nobody ever got fired for using a struct

    A performance investigation at Feldera revealed that SQL tables with hundreds of nullable columns caused significant serialization overhead when mapped to Rust structs. The root issue: rkyv's ArchivedString loses Rust's niche optimization, forcing an explicit Option discriminant, and with 700+ optional fields the archived struct ballooned to 2x the in-memory size. The fix introduces a bitmap-based serialization layout that strips Option wrappers from the archived format and records nullability in a compact bitfield. A further sparse layout stores only present values with relative pointers, dramatically reducing disk I/O for wide, mostly-null rows. The customer's throughput was restored after serialized row size dropped by roughly 2x.

  5. 5
    Article
    Avatar of hnHacker News·5w

    The Cost of Indirection in Rust

    Function call indirection in Rust async code is almost never a real performance concern. The Rust compiler in release mode frequently inlines extracted helper functions, producing identical assembly to inlined code. The actual costs worth worrying about are I/O, locks, and allocations — not function call boundaries, which cost only a few CPU cycles. Real indirection overhead only matters in tight inner loops, with dyn Trait dynamic dispatch, or in explicitly performance-critical paths. The post argues that premature inlining sacrifices code readability, testability, and maintainability for no measurable gain, and recommends using profilers like callgrind or perf to verify actual bottlenecks before optimizing. The right approach is to extract well-named functions, trust the optimizer, and only reach for #[inline] when profiler data justifies it.

  6. 6
    Article
    Avatar of hnHacker News·3w

    We Rewrote JSONata with AI in a Day, Saved $500K/Year | Reco

    Reco's principal data engineer used AI to rewrite JSONata (a JavaScript-based JSON query language) as a pure-Go library called gnata in about 7 hours, spending $400 in AI tokens. The original setup ran jsonata-js pods on Kubernetes, costing ~$300K/year and adding ~150 microsecond RPC overhead per evaluation across billions of events. gnata uses a two-tier evaluation architecture: a fast path for simple expressions that operates directly on raw JSON bytes with zero heap allocations, and a full path with complete JSONata 2.x semantics. A streaming layer batches N expressions against each event, reading raw bytes only once. After a week of shadow-mode validation with 1,778 test cases and 2,107 integration tests, gnata replaced the RPC fleet entirely, delivering 25-1000x speedups. Combined with a rule engine refactor enabled by gnata's batch evaluation capabilities, the total savings reached $500K/year in under two weeks of work.

  7. 7
    Article
    Avatar of quarkusQuarkus·6w

    Quarkus has great performance – and we have new evidence

    The Quarkus team published a new transparent, reproducible benchmark comparing Quarkus and Spring Boot performance. Results show Quarkus handles 2.7x more transactions per second (19,255 vs 7,238 tps), starts 2.3x faster, and uses half the memory. The benchmark addresses past shortcomings: outdated data, missing throughput metrics, and lack of reproducibility. The team open-sourced the benchmark code, invited Spring Boot community input to ensure fairness, and explored questions like virtual threads impact (+6k tps for all frameworks) and Spring Boot 3 vs 4 differences. The post also clarifies that while Quarkus JVM mode outperforms alternatives across all metrics, native compilation does cut throughput in half (though startup and memory improve dramatically), making native mode best suited for frequently restarted or low-workload applications.

  8. 8
    Article
    Avatar of figmaFigma·4w

    How we rebuilt the foundations of component instances

    Figma's engineering team replaced a decade-old Instance Updater architecture with a new reactive system called Materializer. The old system handled component instances in a self-contained but increasingly fragile way, causing cascading updates and editor lockups in large design system files. The new architecture introduces push-based dependency tracking, automatic invalidation, and a shared runtime orchestration layer that cleanly separates layout, variable evaluation, and instance resolution. Common operations like variable mode changes improved by 40–50% in large files. The framework is generic enough that other Figma features like rich text and slots are now built on top of it, accelerating future development. Rollout involved months of parallel validation across hundreds of thousands of production files to ensure correctness and performance parity.

  9. 9
    Article
    Avatar of hnHacker News·4w

    Rewriting our Rust WASM Parser in TypeScript

    A team built a custom DSL parser in Rust compiled to WASM, then discovered the bottleneck wasn't computation but the WASM-JS boundary overhead. Attempting to skip JSON serialization via serde-wasm-bindgen made things 30% slower due to fine-grained boundary crossings. Rewriting the parser in TypeScript eliminated the boundary entirely, yielding 2.2-4.6x faster per-call performance. They also fixed an O(N²) streaming problem by caching completed statement ASTs, reducing total streaming cost by 2.6-3.3x. Key lessons: WASM shines for compute-bound tasks with minimal interop, not for frequently-called parsers on small inputs where boundary overhead dominates.

  10. 10
    Article
    Avatar of hnHacker News·3w

    The Three Pillars of JavaScript Bloat

    JavaScript dependency trees have grown bloated over time due to three main causes: (1) packages built for very old engines (ES3/IE6) or cross-realm safety that most developers no longer need, (2) atomic architecture where trivial one-liners like `Array.isArray(val) ? val : [val]` became their own npm packages with single consumers and supply chain risks, and (3) ponyfills for features now natively supported everywhere that were never removed. Tools like knip, the e18e CLI, npmgraph, and the module-replacements project can help identify and eliminate this bloat. The author argues the small group needing legacy compatibility should maintain their own special stack, while the majority should benefit from modern, lightweight dependencies.