Over the past six months, we've been working with NVIDIA to get the most out of their new TensorRT-LLM library. TensorRT-LLM provides an easy-to-use Python interface to integrate with a web server for fast, efficient inference performance with LLMs. In this post, we're highlighting some key areas where our collaboration with NVIDIA has been particularly important.

databricks

Databricks is integrating NVIDIA TensorRT-LLM into their inference stack for serving Large Language Models. TensorRT-LLM is an open source library for state-of-the-art LLM inference that improves performance. Databricks customers can use the inference server via AI Playground.

Integrating NVIDIA TensorRT-LLM with the Databricks Inference Stack