NVIDIA achieved single-digit microsecond latency for LSTM inference on the GH200 Grace Hopper Superchip, matching or beating specialized FPGA hardware in the STAC-ML Markets (Inference) Tacana benchmark. Key results include 4.61–4.70 µs p99 latency for LSTM_A and 6.88–7.10 µs for LSTM_B. The post details the custom CUDA kernel

13m read timeFrom developer.nvidia.com
Post cover image
Table of contents
STAC-ML benchmarking in financial servicesNVIDIA key STAC-ML resultsLow-latency LSTM inference on GPUsHow to build and run the low-latency LSTM inference reference implementationGet started with low-latency inference

Sort: