Wix built a Data-to-Production platform that bridges their petabyte-scale Iceberg data warehouse with production microservices requiring millisecond latency. The system consists of three layers: a self-service metadata registration console with human-in-the-loop governance, a dynamic Airflow-based ingestion engine supporting three loading strategies (OVERWRITE, UPSERT, PARTITION_REPLACE) with custom idempotent operators, and a type-safe Scala serving layer using a semantic JSON DSL that compiles to optimized ClickHouse SQL. The platform handles billions of rows, automatic schema evolution, catch-up logic, and includes safety mechanisms like empty staging protection and row count validation, while preventing SQL injection and enforcing resource governance.
Table of contents
The Architecture: A Bird’s-Eye View1. The Metadata Layer: Registration & Governance2. The Engine Room: Deep Dive into the Ingestion Pipeline3. The Serving Layer: Safe, Semantic, and Low-Latency Access3.1. The Semantic DSL: Querying as Code3.2. The Compilation Pipeline3.3. Performance Optimizations & Consistency3.4. Security:3.5. Developer ExperienceConclusionSort: