A sponsored benchmark comparing Supermetal's new Iceberg sink against Apache Flink, Kafka Connect (Debezium), and Apache Spark for CDC-based Postgres-to-Iceberg pipelines. Using TPC-H SF=50 data on identical single-node AWS infrastructure, Supermetal completed snapshotting in 13 minutes with no tuning, while Flink took 90–116 minutes, Kafka Connect 120 minutes, and Spark over 3 hours. The key differentiators are Supermetal's fast CDC source, low serialization overhead, and its unique ability to switch Iceberg sink behavior (append-only with target file size vs. merge-on-read with time-based flush) between snapshot and live CDC phases. Flink required aggressive fetch/split size tuning; Kafka Connect needed careful batch tuning; Spark struggled on single-node due to its scale-out architecture. All tools produced correct data with matching row counts.

14m read timeFrom thenewstack.io
Post cover image
Table of contents
Test setupSupermetalFlinkKafka ConnectSparkData correctnessSummary

Sort: