SkyPilot's MOUNT_CACHED storage mode now supports tunable parameters and named workload presets for optimizing object store access in AI training pipelines. After running over 1000 benchmarks, the team found that cache settings must differ significantly between model loading (large sequential reads) and dataset loading (many small random reads). Using optimized parameters for model loading yielded up to 7.89x read bandwidth improvement. Four presets are now available — MODEL_CHECKPOINT_RO, MODEL_CHECKPOINT_RW, DATASET_RO, and DATASET_RW — enabling one-line YAML configuration for common AI storage workloads. Key tips include using separate buckets for model weights and training data, and sizing the cache to match the working set.

7m read timeFrom blog.skypilot.co
Post cover image
Table of contents
Background: AI Storage Workloads and the Importance of Caching #Configurations to Tune MOUNT_CACHED Mode #AI Storage Workload Benchmarks #Speeding Up Model Loading #Training Data Are Not Model Weights #Introducing MOUNT_CACHED Workload Types #Tips and Tricks #Conclusion #Next Steps #

Sort: