Halodoc shares their end-to-end experience migrating Amazon MSK from ZooKeeper to KRaft (Kafka 4.0.x) with zero customer impact. Since AWS removed ZooKeeper support from MSK 4.x onward, the team ran a POC validating performance parity and client compatibility (including a required confluent-kafka-go upgrade from v1.8.2 to v2.6.1 for Go services), then executed a canary deployment strategy — progressively shifting traffic from 20% to 100% on the new KRaft cluster. The post covers the full migration runbook: environment setup, Schema Registry migration, phased traffic cutover steps, monitoring across four dimensions, and three pre-defined rollback scenarios. Key learnings include treating Schema Registry migration as a first-class concern, auditing Kafka client library versions early, and building observability dashboards before migration day.
Table of contents
IntroductionWhy Move from ZooKeeper to KRaft?POC Validation: Proving Functional Parity Before ProductionWhy Standard Upgrades Don't ApplyOur Migration Strategy: Canary DeploymentPre-Migration Readiness ChecklistProduction Migration ExecutionMonitoring During MigrationRollback PlanKey LearningsConclusionSort: