A developer shares implementation notes on Rastair, a Rust-based bioinformatics CLI tool for variant and methylation calling from TAPS genome sequencing data. Key engineering highlights include: parallelizing work across CPU cores using Rayon with an ordered channel pattern, reducing allocations via iterator chains, SmallVec for short lists, SmolStr for short strings, and switching to the mimalloc allocator. The post also covers integrating with the C library htslib via rust-htslib (including a fork to eliminate unnecessary CString allocations), a two-phase suffix-based read deduplication algorithm, and auto-generating CLI and VCF field documentation from Rust macros. A teaser mentions an upcoming post on porting a Random Forest model to run as a GPU compute shader.

10m read timeFrom deterministic.space
Post cover image
Table of contents
IntroWhat Rastair doesPutting all CPU cores to workReducing allocationsThere’s a big C librarySmall Bonus ChallengeNice documentationFurther notes

Sort: