NVIDIA collaborates with Google to deliver Gemma, an optimized family of open models built for high throughput and performance. TensorRT-LLM provides optimizations and kernels that boost Gemma's performance, including FP8, XQA, and INT4 AWQ.
•4m read time• From developer.nvidia.com
Table of contents
TensorRT-LLM makes Gemma models fasterReal-time performance with over 79K tokens per secondGet started nowSort: