Modal demonstrates how to scale ComfyUI image generation workflows as APIs using serverless autoscaling. Three approaches are tested: one container per input (4.4s median response, $0.18/min), concurrent processing on single container (32s response, $0.02/min), and warm container pools (faster cold starts, $0.09/min baseline cost). Load testing with 100 concurrent users shows Modal can scale to 62 GPUs automatically. The choice depends on balancing inference speed, cost, and traffic patterns.

6m read timeFrom modal.com
Post cover image
Table of contents
Load testing with LocustOption 1: Run one container per inputOption 2: Run multiple inputs on one containerOption 3: Maintain a warm pool with min_containersScaling to 100 concurrent usersConclusionCoda: Deploying with Comfy Deploy

Sort: