Microsoft and NVIDIA released Part 2 of their NVIDIA Dynamo collaboration, introducing automated resource planning and dynamic scaling for LLM inference on Azure Kubernetes Service. The release features two key components: the Dynamo Planner Profiler, which automates configuration searches to optimize GPU allocation for prefill
Sort: