Seqair is a new Rust-native library for bioinformatics file formats (SAM/BAM/CRAM/VCF/BCF), built as an alternative to htslib and rust-htslib. The core design innovation is a columnar record store for BAM data: instead of per-record heap allocations, variable-length fields (names, bases, CIGAR, quality, aux tags) are stored in contiguous slabs with offsets, enabling near-zero allocations in the hot loop and efficient in-place realignment by appending new CIGAR data. Other highlights include a type-state builder API for VCF/BCF writing, a forkable reader design using Arc for shared immutable state across rayon worker threads, and a RegionBuf that merges BAM index chunks into minimal large reads to reduce NFS round-trips on HPC clusters. Benchmarks show seqair is competitive with htslib on pileup and BCF writing, with a notable 2.5x win on BCF. The CRAM implementation was largely generated by Claude Code from the spec and is acknowledged as not production-ready.

17m read timeFrom deterministic.space
Post cover image
Table of contents
An experimentA Rustic APIA columnar record store for BAMBGZF and the shape of cluster I/OForkable readersA note on CRAM and testingPerformanceWhat I got out of it

Sort: