A comprehensive guide to building production-ready data pipelines in Microsoft Fabric using dltHub (dlt), addressing the platform's lack of a built-in data quality engine. Covers a six-stage data quality lifecycle: source profiling, schema/contract enforcement, pre-load Write-Audit-Publish (WAP) validation, controlled lakehouse loading, monitoring, and iterative improvement. Details two deployment patterns—validating at ingestion (Pattern A) vs. using dlt as a quality gate between Bronze and Silver medallion layers (Pattern B)—with code examples and tradeoff analysis. Also covers PII detection and masking strategies, quarantine table patterns for failed records, and monitoring metrics. Particularly targeted at small data teams (1–2 engineers) who need scalable, low-overhead data quality without dedicated tooling.
Table of contents
1. Introduction Link icon2. The challenges of data quality in Microsoft Fabric Link icon3. The dltHub solution Link icon4. Mapping the DQ lifecycle to dlthub Link icon5. Protecting sensitive data (PII) Link icon6. Integrating into a Microsoft Fabric pipeline Link icon6.5 Alternative pattern: dlt quality gates between medallion layers Link icon8. Benefits for small teams Link icon9. Conclusion Link iconSort: