NVIDIA collaborates with Google to deliver Gemma, an optimized family of open models built for high throughput and performance. TensorRT-LLM provides optimizations and kernels that boost Gemma's performance, including FP8, XQA, and INT4 AWQ.

4m read time From developer.nvidia.com
Post cover image
Table of contents
TensorRT-LLM makes Gemma models fasterReal-time performance with over 79K tokens per secondGet started now

Sort: