How LinkedIn Built a Next-Gen Service Discovery for 1000s of Services
LinkedIn replaced its decade-old Zookeeper-based service discovery system with a next-generation architecture using Kafka for writes and gRPC/xDS for reads. The new system handles hundreds of thousands of service instances with 10x better median latency (P50 < 1s vs 10s) and 6x better P99 latency. Key improvements include horizontal scalability through Go-based Observer components, eventual consistency over strong consistency, multi-language support via xDS protocol, and cross-fabric capabilities. The migration used a dual-mode strategy where applications ran both systems simultaneously, with automated dependency analysis to safely transition thousands of services without downtime.
