Schema proliferation from one-to-one event-to-schema mapping creates compounding complexity in Kafka and Flink pipelines: fragmented queries requiring multi-table UNIONs, high maintenance overhead when shared fields change, and schema drift across independently maintained schemas. The solution is discriminator-based schema consolidation, which collapses multiple event variants into a single schema using enum discriminator fields (eventType, rideType) and nullable attribute blocks for variant-specific data. A ride-sharing example shows how 12 schemas collapse to 2 tables. Implementation uses a two-layer adapter pattern in Flink: pure transformation adapter classes (framework-independent, easily unit-tested) plus a framework integration layer. Apache Avro with Full or Full_Transitive Schema Registry compatibility handles safe evolution — new variants add nullable blocks without breaking existing consumers. Trade-offs include wider records, governance overhead, and changed debugging workflows.

13m read timeFrom infoq.com
Post cover image
Table of contents
IntroductionWhat One-to-One Mapping Looks Like at ScaleThe Problem: Schema ProliferationThe Solution: Consolidated Schema DesignImplementing This Pattern in a Flink PipelineSchema Evolution with Apache AvroTrade-offsWhat This Approach Looks Like in PracticeBeyond the Adapter: Native Multi-Event SupportConclusionAbout the Author

Sort: