Datadog releases Toto 2.0, a family of five open-weights time series forecasting models ranging from 4M to 2.5B parameters. For the first time in the field, a time series foundation model (TSFM) demonstrates reliable improvement with scale—every size outperforms the one below it with no saturation at 2.5B. Toto 2.0 achieves state-of-the-art results on BOOM, GIFT-Eval, and TIME benchmarks, despite being trained only on observability and synthetic data (no public forecasting datasets). It is also 7× more parameter-efficient than Toto 1.0 and dramatically faster at inference thanks to contiguous patch masking (CPM). The post discusses remaining open challenges: closing the long-horizon gap with classical baselines, principled data curation, treating observability metrics as a distinct modality, and building multimodal world models for distributed systems. All model weights and the distributed u-μP training library are released under Apache 2.0.

11m read timeFrom datadoghq.com
Post cover image
Table of contents
ResultsWhat’s next for TSFMs?ReleaseQuick start

Sort: