A deep dive into why bzip2 (and bzip3) outperforms LZ77-based compression algorithms like gzip, zstd, xz, and brotli for text and code data. The author, motivated by compressing Lua code in a Minecraft mod with limited disk space, benchmarks multiple compressors and finds bzip2 achieves the best ratio on code. The post explains the fundamental algorithmic difference: bzip uses Burrows-Wheeler Transform (BWT) instead of LZ77, which groups characters by context rather than finding earlier occurrences. This makes bzip deterministic, heuristic-free, and easier to implement a compact custom decoder for. The author also discusses decoder size tradeoffs, performance nuances, and why adding code-structure-aware preprocessing rarely improves ratios over general-purpose BWT-based compression.

10m read timeFrom purplesyringa.moe
Post cover image

Sort: