Introducing dial9: a flight recorder for Tokio
dial9 is a new runtime telemetry tool for Tokio that captures a full timeline of runtime events — individual polls, parks, wakes, and Linux kernel events — rather than just aggregate metrics. Built to diagnose production-only performance issues, it helped identify kernel scheduling delays of 10ms+ on an AWS service, fd_table lock contention causing 100ms+ polls during startup, and a global mutex in backtrace::trace. With under 5% overhead, it can run continuously in production. Setup requires wrapping the Tokio runtime with TracedRuntime and traces can be viewed in a browser-based viewer or stored to S3.