Figma's original data pipeline used a daily full-table sync from Amazon RDS PostgreSQL to Snowflake. As the company grew, this approach became untenable — the largest tables took days to sync, and dedicated export replicas cost millions annually. Figma rebuilt the pipeline around Change Data Capture (CDC), streaming change events through Kafka and applying incremental merges into Snowflake on a configurable schedule (default 3 hours, down to 30 minutes for billing). They built in-house rather than using vendors due to cost and scale concerns. A rigorous validation workflow bootstraps an independent copy of each table weekly and compares it cell-by-cell to catch silent failures. The result: data freshness dropped from 30+ hours to under 3 hours, tables 10x larger are handled reliably, and eliminating dedicated replicas saved millions per year.

11m read timeFrom blog.bytebytego.com
Post cover image
Table of contents
Harness engineering for agentic code review (Sponsored)When SELECT * Becomes Your BottleneckIncremental SynchronizationTrust But VerifyConclusion

Sort: