A data engineering approach that uses a plain-English ontology as a runtime access policy to handle schema evolution automatically. Instead of maintaining a static column allowlist, the policy is written as natural-language rules (taxonomy + relationships), and an LLM applies those rules column-by-column using name patterns, data types, cardinality ratios, and value samples. The system handles ambiguous cases like high-cardinality text columns where names don't reveal PII — the LLM inspects sampled values to decide. Demonstrated on a fintech dataset with DuckDB and dlt, the approach correctly passed UUID-based user references while rejecting PII columns, with no code changes needed when new columns arrived. Limitations include numeric columns being treated as safe regardless of content, and no cross-column re-identification analysis.

9m read timeFrom dlthub.com
Post cover image
Table of contents
INTRO Link iconThe ontology encodes the policy in plain English Link iconThe policy holds when the schema changes Link iconWhat the ontology actually bought Link iconThe limits worth knowing Link iconWhen to use this Link iconTry it Link icon

Sort: