Databricks is integrating NVIDIA TensorRT-LLM into their inference stack for serving Large Language Models. TensorRT-LLM is an open source library for state-of-the-art LLM inference that improves performance. Databricks customers can use the inference server via AI Playground.

3m read timeFrom databricks.com
Post cover image
Table of contents
Introducing NVIDIA TensorRT-LLMFlexibility Through PluginsPython API for Easier IntegrationReady to Begin Experimenting?

Sort: