NVIDIA offers tools like Perf Analyzer and Model Analyzer to help optimize ML inference performance, particularly for large language models (LLMs) by measuring metrics such as time to first token, output token throughput, and inter-token latency. The latest tool, GenAI-Perf, introduced with NVIDIA Triton, provides accurate
Sort: