Materialize's Field CTO explains the engineering challenges of streaming live, transactionally consistent operational data into Apache Iceberg, which was designed for batch ETL. Key problems addressed include: avoiding memory-heavy buffering by minting batch descriptions ahead of time so workers can stream data to S3
Table of contents
How Materialize Thinks About ConsistencyThe Naive ApproachMinting Batch Descriptions Ahead of TimeThe Delete ProblemRecovery Without External StateThe Empty Snapshot ProblemMulti-Table TransactionsSort: