Postgres logical replication omits unchanged TOAST column values in CDC events for tables without replica identity FULL, causing downstream consumers to receive incomplete row state. Three solutions are explored: Debezium's built-in reselect post processor (simple but prone to data races and source DB load), Apache Flink DataStream API with a KeyedProcessFunction managing per-record state, and Flink SQL using OVER aggregation or the upcoming Process Table Functions (PTFs) in Flink 2.1. PTFs offer the best balance—SQL simplicity with imperative state control, precise state lifecycle management tied to delete events, and reusable encapsulation. A future Debezium-native solution using embedded RocksDB or SlateDB state is also proposed.

16m read timeFrom morling.dev
Post cover image
Table of contents
Debezium Reselect PostprocessorFlink DataStream APIFlink SQL With OVER AggregationFlink Process Table FunctionsSummary and Discussion

Sort: