Row vs. Batch Contracts: Using AWAP to Prevent Schema Scars and State Corruption

The AWAP (Audit-Write-Audit-Publish) protocol is presented as a middle ground between brittle strict schema enforcement and permissive auto-evolution that silently corrupts data. It introduces a two-gate validation architecture: a row-level gate (Audit 1) that catches syntactic violations like missing primary keys before they trigger schema mutations, and a batch-level gate (Audit 2) that detects semantic anomalies like abnormal NULL rates or suspicious aggregate patterns before promoting data to production. A practical example using dlt with a street survey dataset demonstrates how to filter invalid rows in-flight, flag suspicious records, compute per-agent suspicious rates in staging, and block untrusted agents from reaching production tables. This prevents both schema scars (permanent DDL mutations from malformed rows) and state corruption (silent overwriting of historical data via MERGE operations).

#backend

#data-engineering

#data-quality

Mar 03•8m read time•From dlthub.com

Table of contents

The Two-Gate Architecture: Row vs. Batch Link icon Evolution: From WAP to AWAP Link icon Practical Example: The Street Survey System Link icon How ‘Bad Data’ Surfaces Link icon Audit 1: The Row-Level Gate Link icon Seeing the bigger picture in staging Link icon Audit 2: The Batch‑Level Gate Link icon Conclusion: Engineering for Resilience Link icon

Comment

Bookmark

Copy

Sort: