How Modal's autoscaling works when running ComfyUI as an API.

Modal

Modal demonstrates how to scale ComfyUI image generation workflows as APIs using serverless autoscaling. Three approaches are tested: one container per input (4.4s median response, $0.18/min), concurrent processing on single container (32s response, $0.02/min), and warm container pools (faster cold starts, $0.09/min baseline cost). Load testing with 100 concurrent users shows Modal can scale to 62 GPUs automatically. The choice depends on balancing inference speed, cost, and traffic patterns.

Scaling ComfyUI

Option 2: Run multiple inputs on one container

Option 3: Maintain a warm pool with min_containers