Even Faster asin() Was Staring Right At Me

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

A follow-up post exploring how applying Estrin's Scheme to a Cg-based asin() polynomial approximation reduces the dependency chain from three to two, enabling instruction-level parallelism on modern out-of-order CPUs. Benchmarks across Intel, AMD, and Apple M4 hardware with GCC, Clang, and MSVC show up to 1.88x speedup over std::asin() on Intel (Windows/MSVC), with modest gains on AMD and Apple silicon. Real-world ray tracer tests show a ~3% improvement on Intel Linux. The post also briefly discusses why LUT-based approaches and SIMD weren't pursued, and emphasizes the importance of measuring before and after any optimization.

7m read timeFrom 16bpp.net
Post cover image
Table of contents
Gotta Go FastBenchmark MeasurementsRay Tracer MeasurementsLast Words (and Opinions)

Sort: