We ran hundreds of benchmarks to tune storage systems for distributed training so you don’t have to.

SkyPilot

SkyPilot's MOUNT_CACHED storage mode now supports tunable parameters and named workload presets for optimizing object store access in AI training pipelines. After running over 1000 benchmarks, the team found that cache settings must differ significantly between model loading (large sequential reads) and dataset loading (many small random reads). Using optimized parameters for model loading yielded up to 7.89x read bandwidth improvement. Four presets are now available — MODEL_CHECKPOINT_RO, MODEL_CHECKPOINT_RW, DATASET_RO, and DATASET_RW — enabling one-line YAML configuration for common AI storage workloads. Key tips include using separate buckets for model weights and training data, and sizing the cache to match the working set.

Cache Me If You Can: Tuning Object Stores for AI

Background: AI Storage Workloads and the Importance of Caching #

Configurations to Tune MOUNT_CACHED Mode #

Introducing MOUNT_CACHED Workload Types #