Designing backfillable data pipelines using idempotent transformation code avoids the complications of ad-hoc SQL. When handling Slowly Changing Dimensions (SCDs), SCD Type 2 is preferred for its immutability and compressive qualities, though it involves complex surrogate key lookups. Alternatively, snapshot tables offer a simpler, reproducible model at the cost of higher data replication, making them ideal in cloud environments where storage is cheaper than engineering time.

5m read timeFrom juhache.substack.com
Post cover image
Table of contents
SCDs?🫸 SCD 1 & 3🧠 SCD 2🎯 Snapshot table

Sort: