AIConfigurator is an open source tool that automates configuration optimization for LLM serving deployments. Instead of exhaustive GPU testing, it decomposes inference into primitive operations (GEMM, attention, MoE dispatch), benchmarks them in isolation, and reassembles measurements to estimate end-to-end performance across thousands of configurations in seconds. It supports disaggregated and aggregated serving modes, outputs Pareto frontier tradeoff visualizations, and generates ready-to-deploy Kubernetes artifacts. The tool now supports TensorRT-LLM, SGLang, and vLLM backends via a framework-agnostic abstraction layer, with community contributions from Mooncake and Alibaba. Alibaba's integration achieved 1.86x throughput on Qwen3-235B-FP8, and their HiSim simulator extends AIConfigurator's static analysis to dynamic traffic modeling with under 5% error. The roadmap includes deeper Dynamo platform integration, automated silicon data collection, and dynamic workload modeling.
Table of contents
Using AIConfigurator to configure disaggregated servingExtending support to multiple frameworksWideEP inference for SGLangHow the SGLang community is contributingWhat’s next for AIConfiguratorSort: