The AWAP (Audit-Write-Audit-Publish) protocol is presented as a middle ground between brittle strict schema enforcement and permissive auto-evolution that silently corrupts data. It introduces a two-gate validation architecture: a row-level gate (Audit 1) that catches syntactic violations like missing primary keys before they trigger schema mutations, and a batch-level gate (Audit 2) that detects semantic anomalies like abnormal NULL rates or suspicious aggregate patterns before promoting data to production. A practical example using dlt with a street survey dataset demonstrates how to filter invalid rows in-flight, flag suspicious records, compute per-agent suspicious rates in staging, and block untrusted agents from reaching production tables. This prevents both schema scars (permanent DDL mutations from malformed rows) and state corruption (silent overwriting of historical data via MERGE operations).
Table of contents
The Two-Gate Architecture: Row vs. Batch Link iconEvolution: From WAP to AWAP Link iconPractical Example: The Street Survey System Link iconHow ‘Bad Data’ Surfaces Link iconAudit 1: The Row-Level Gate Link iconSeeing the bigger picture in staging Link iconAudit 2: The Batch‑Level Gate Link iconConclusion: Engineering for Resilience Link iconSort: