Best of Change Data CaptureApril 2026

  1. 1
    Article
    Avatar of bytebytegoByteByteGo·8w

    How Datadog Redefined Data Replication

    Datadog's Metrics Summary page suffered 7-second p90 latency due to expensive joins on 82K metrics against 817K configurations in Postgres. The root cause was using a transactional database for search workloads. The solution was Change Data Capture (CDC) using Debezium to stream Postgres WAL changes into Kafka, then into a dedicated search platform. Datadog chose asynchronous replication for resilience at scale, accepting brief replication lag as a tradeoff. To handle schema evolution safely, they built automated SQL validation and a Kafka Schema Registry enforcing backward compatibility with Avro serialization. Finally, they used Temporal to automate pipeline provisioning end-to-end, turning a one-off fix into a company-wide data replication platform supporting Postgres-to-Postgres, Postgres-to-Iceberg, Cassandra, and cross-region Kafka pipelines.

  2. 2
    Article
    Avatar of bytebytegoByteByteGo·7w

    Nextdoor’s Database Evolution: A Scaling Ladder

    Nextdoor's engineering team evolved their database architecture through a series of deliberate scaling steps. Starting from a single PostgreSQL instance, they added PgBouncer for connection pooling to address the process-per-connection bottleneck. They then introduced primary-replica architecture with time-based dynamic routing to handle read-heavy traffic while maintaining read-your-own-writes consistency. A Valkey (Redis-compatible) look-aside cache with MessagePack serialization and Zstd compression was layered on top for speed. To prevent stale cache data, they implemented a versioning system using PostgreSQL triggers and atomic Lua scripts for compare-and-set updates. Finally, Debezium-based Change Data Capture provides a self-healing reconciliation mechanism. Sharding by neighborhood ID represents the final scaling tier for write-heavy growth.

  3. 3
    Article
    Avatar of databasedailyDatabase Daily·4w

    Why PostgreSQL CDC Breaks in Production

    PostgreSQL CDC failures in production rarely stem from WAL unreliability. The real culprits are workflow-level issues: initial load and CDC not sharing the same WAL boundary, checkpoints advancing before writes are durable, non-idempotent retry behavior, ordering broken by parallel workers, hidden lag from long transactions, and late schema change handling. These failure patterns apply broadly to database replication and migration pipelines, where recovery semantics, ordering, and restart behavior matter more than simply reading changes.

  4. 4
    Article
    Avatar of debeziumDebezium·4w

    Debezium 3.6.0.Alpha1 Released

    Debezium 3.6.0.Alpha1 is the first preview release in the 3.6 cycle, resolving 100 issues across the entire ecosystem. Key highlights include: a new Docling SMT for enriching change events with structured document output suited for AI/RAG workflows; a new Amazon SNS sink for Debezium Server enabling direct CDC fan-out to AWS subscribers; a reworked Oracle LogMiner batch-sizing algorithm based on log file size/count instead of hard-to-tune SCN properties; OpenTelemetry tracing support for the Cassandra connector; OAuth2 authentication and batch mode for the HTTP sink; Agroal replacing C3P0 as the JDBC sink connection pool; and Helm chart improvements for the Debezium Operator and Platform. Breaking changes affect MongoDB (Avro schema name sanitization for collection names starting with digits) and PostgreSQL (enum values now returned in logical sort order rather than storage order).