Tinybird's storage architecture separates compute from storage using S3/GCS as the primary data store with local SSD caching on each replica. Key components include the Gatherer (batches streaming events before writing to avoid small-file proliferation), zero-copy replication (replicas share one S3 copy rather than duplicating data), and a custom packed part format developed in Tinybird's ClickHouse fork that consolidates MergeTree metadata files into a single S3 object, reducing write API calls by 30-40%. The upstream ClickHouse project deprecated zero-copy replication, which is a primary reason Tinybird maintains its own fork. Cache eviction is managed by ClickHouse with safeguards to prevent large scans from evicting hot data.
Table of contents
The high-level architectureHow data gets inHow queries workHow the local cache worksZero-copy replicationPacked part format: cutting S3 costsWhat you don't have to manageFurther readingSort: