Wix built a Data-to-Production platform that bridges their petabyte-scale Iceberg data warehouse with production microservices requiring millisecond latency. The system consists of three layers: a self-service metadata registration console with human-in-the-loop governance, a dynamic Airflow-based ingestion engine supporting three loading strategies (OVERWRITE, UPSERT, PARTITION_REPLACE) with custom idempotent operators, and a type-safe Scala serving layer using a semantic JSON DSL that compiles to optimized ClickHouse SQL. The platform handles billions of rows, automatic schema evolution, catch-up logic, and includes safety mechanisms like empty staging protection and row count validation, while preventing SQL injection and enforcing resource governance.

10m read timeFrom wix.engineering
Post cover image
Table of contents
The Architecture: A Bird’s-Eye View1. The Metadata Layer: Registration & Governance2. The Engine Room: Deep Dive into the Ingestion Pipeline3. The Serving Layer: Safe, Semantic, and Low-Latency Access3.1. The Semantic DSL: Querying as Code3.2. The Compilation Pipeline3.3. Performance Optimizations & Consistency3.4. Security:3.5. Developer ExperienceConclusion

Sort: