Atropos (SOSP'25) addresses application overload by canceling problematic tasks rather than throttling requests. Traditional overload control monitors global metrics and drops random requests, missing internal resource contention from "rogue whale" tasks that monopolize buffer pools, locks, or queues. Atropos tracks logical resource usage and estimates future demand to identify which tasks cause the most harm. By canceling the highest-impact task using existing application hooks, it restores 96% throughput while canceling less than 0.01% of requests. Tested across MySQL, Postgres, Elasticsearch, and other systems, it outperforms conventional approaches that drop far more requests without resolving the underlying bottleneck.
Sort: