Git for Data Applied
A deep-dive comparison of Git-like versioning tools for data, covering LakeFS, Dolt, Nessie, Bauplan, MotherDuck, DuckLake, Neon, and Supabase. Each tool is examined for how it separates metadata from data to enable zero-copy branching, snapshots, time travel, and rollbacks without duplicating petabytes. The tools are grouped into three categories: data lake versioning (object storage), transactional/OLTP databases, and analytical databases. The article also covers related workflows like Dagster branch deployments and AI agent testing on isolated data branches. A comparison table of GitHub activity metrics is included to assess ecosystem health.