Tinybird's storage architecture separates compute from storage using S3/GCS as the primary data store with local SSD caching on each replica. Key components include the Gatherer (batches streaming events before writing to avoid small-file proliferation), zero-copy replication (replicas share one S3 copy rather than duplicating data), and a custom packed part format developed in Tinybird's ClickHouse fork that consolidates MergeTree metadata files into a single S3 object, reducing write API calls by 30-40%. The upstream ClickHouse project deprecated zero-copy replication, which is a primary reason Tinybird maintains its own fork. Cache eviction is managed by ClickHouse with safeguards to prevent large scans from evicting hot data.

6m read timeFrom tinybird.co
Post cover image
Table of contents
The high-level architectureHow data gets inHow queries workHow the local cache worksZero-copy replicationPacked part format: cutting S3 costsWhat you don't have to manageFurther reading

Sort: