Speeding up Timely Dataflow by 100x
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
Frank McSherry demonstrates a 100x speedup in Timely Dataflow by introducing a third operator scheduling mode: 'notify only if holding a capability'. In a benchmark with 1,000 dataflows each containing ~1,000 operators, the naive approach required visiting all ~1,000,000 operators on every tick (similar to Flink's
Table of contents
The set-up: big dataflowsA reality checkSmartness: tracking progress in timely dataflowOpting out of timestamp progressConclusionsTechnical details and sneaky caveatsSort: