Even Faster asin() Was Staring Right At Me
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
A follow-up post exploring how applying Estrin's Scheme to a Cg-based asin() polynomial approximation reduces the dependency chain from three to two, enabling instruction-level parallelism on modern out-of-order CPUs. Benchmarks across Intel, AMD, and Apple M4 hardware with GCC, Clang, and MSVC show up to 1.88x speedup over std::asin() on Intel (Windows/MSVC), with modest gains on AMD and Apple silicon. Real-world ray tracer tests show a ~3% improvement on Intel Linux. The post also briefly discusses why LUT-based approaches and SIMD weren't pursued, and emphasizes the importance of measuring before and after any optimization.
Table of contents
Gotta Go FastBenchmark MeasurementsRay Tracer MeasurementsLast Words (and Opinions)Sort: