As promised in the previous article, this week I’m back with lessons and insights after (a whole weekend) researching the ClickHouse MergeTree engine. The article is structured as follows: first…

Data Engineer Things

The post provides an in-depth exploration of the ClickHouse MergeTree table engine, covering its data organization, write/read processes, merging and mutation mechanisms, and data replication methods. Key concepts such as wide and compact formats for column storage, primary and mark files, and idempotent inserts are discussed. The author explains how rows are inserted and managed within the MergeTree engine, including the handling of background merges to optimize performance. Additionally, the post addresses how data replication is utilized for high availability and increased read throughput.

I spent 8 hours learning the ClickHouse MergeTree Table Engine