A patch submitted to the Linux kernel mailing list introduces an ARM64-optimized CRC64-NVMe implementation using NEON Polynomial Multiply Long (PMULL) instructions. Developed by Demian Shulhan, the implementation uses C intrinsics (arm_neon.h) instead of raw assembly for better readability. Key design choices include 4KB chunking to avoid preemption latency spikes, pre-calculated fold constants, and a fallback to the generic implementation for buffers under 128 bytes or on Big-Endian systems. Benchmarks on a Cortex-A72 show throughput jumping from ~268 MB/s to ~1556 MB/s at 4096 bytes, a nearly 6x improvement. The patch is currently under review.

2m read timeFrom phoronix.com
Post cover image

Sort: