Microsoft and NVIDIA released Part 2 of their NVIDIA Dynamo collaboration, introducing automated resource planning and dynamic scaling for LLM inference on Azure Kubernetes Service. The release features two key components: the Dynamo Planner Profiler, which automates configuration searches to optimize GPU allocation for prefill

4m read time From infoq.com
Post cover image

Sort: