Halodoc shares their end-to-end experience migrating Amazon MSK from ZooKeeper to KRaft (Kafka 4.0.x) with zero customer impact. Since AWS removed ZooKeeper support from MSK 4.x onward, the team ran a POC validating performance parity and client compatibility (including a required confluent-kafka-go upgrade from v1.8.2 to v2.6.1 for Go services), then executed a canary deployment strategy — progressively shifting traffic from 20% to 100% on the new KRaft cluster. The post covers the full migration runbook: environment setup, Schema Registry migration, phased traffic cutover steps, monitoring across four dimensions, and three pre-defined rollback scenarios. Key learnings include treating Schema Registry migration as a first-class concern, auditing Kafka client library versions early, and building observability dashboards before migration day.

10m read timeFrom blogs.halodoc.io
Post cover image
Table of contents
IntroductionWhy Move from ZooKeeper to KRaft?POC Validation: Proving Functional Parity Before ProductionWhy Standard Upgrades Don't ApplyOur Migration Strategy: Canary DeploymentPre-Migration Readiness ChecklistProduction Migration ExecutionMonitoring During MigrationRollback PlanKey LearningsConclusion

Sort: