Trivago runs 50+ Kafka sink services across three regions (US, EU, ASIA) that materialize CDC events into service-local databases. Most sinks were idle the majority of the day yet consumed ~1 CPU core and 1 GB RAM each, wasting significant cluster capacity. CPU/memory-based autoscaling proved ineffective because sink workloads
•16m read time• From tech.trivago.com
Table of contents
Introduction / ContextBackground: Current Data FlowThe Problem: Idle Sinks Burning ResourcesWhy Traditional Autoscaling Wasn’t EnoughEvent-Driven Scaling with KEDABefore vs AfterOur SolutionWhat this gives us in practiceEdge case: consumer group cleanup for very low-traffic topicsMigration PathResultsConclusionSort: