Netflix built an internal automation platform to migrate nearly 400 RDS PostgreSQL production clusters to Amazon Aurora PostgreSQL with minimal downtime. The system uses a self-service workflow that handles physical read replica creation from storage snapshots, WAL replication validation, CDC slot coordination, controlled quiescence and cutover, and rollback safeguards. Because Netflix routes all database access through an Envoy-based data access layer that abstracts endpoints from application code, migrations happen transparently at the infrastructure level. A real-world case study shows how the team resolved an elevated OldestReplicationSlotLag caused by a stale logical replication slot before completing a successful migration for device certification and partner billing workloads.
Sort: