A deep dive into using Rust typestates and phantom types to enforce compile-time correctness when writing VCF/BCF genomic data files. The author walks through a header builder with phase-based state transitions, typed keys using uninhabited enums, and a record encoder state machine — all designed to make misuse a compile error rather than a runtime bug. The post also covers benchmark results showing seqair outperforms both htslib and noodles, discusses the limits of the type system (e.g., BCF integer sentinel bugs caught only by property tests), and reflects on API design tradeoffs like key.encode() vs encoder.encode().
Table of contents
The header builderTyped keys and phantom typesThe record encoder typestateWhy key.encode() and not encoder.encode()Domain types encoding themselvesWhere the type system endsBuilding the index while writingBenchmarksWhat I’d do differentlySort: