Training ML models directly from object storage is possible with the right access patterns and client tooling. 
We benchmarked Tigris against AWS S3 and introduced the Tigris Acceleration Gateway (TAG), a local caching 
service that significantly reduces epoch duration and worker requirements for multi-epoch training workloads.

Tigris

A detailed benchmark comparing ML training throughput on Tigris Object Storage versus AWS S3, using the S3 Connector for PyTorch on a g5.8xlarge instance. Results show Tigris delivers ~134 samples/sec on random-access data, within 3% of AWS S3's ~138 samples/sec. The post also introduces the Tigris Acceleration Gateway (TAG), a local NVMe-backed S3-compatible caching sidecar. With TAG's warm cache, multi-epoch training completes 5.7x faster per epoch, GPU saturation is achieved with only 4 workers instead of 16, and the data pipeline delivers samples at ~200x the rate the GPU can consume them. Key findings: sharding reduces worker requirements for single-epoch workloads, while TAG eliminates the need for sharding entirely in multi-epoch scenarios by caching raw unsharded objects locally.

Benchmarking ML Training Throughput on Tigris Object Storage

Background: why data loading is a bottleneck ​

Part 1: Random access — Tigris throughput scales near-linearly to 16 workers, where the GPU saturates at ~134 samples/sec ​

Part 2: Sequential access (sharded) — GPU saturates at 8 workers regardless of shard size; sharding halves worker requirements versus random access ​

Part 3: Multi-epoch training with TAG — warm-cache epochs complete 5.7x faster; GPU saturates at 4 workers versus 16 without caching ​

Part 4: Entitlement benchmark — Tigris delivers samples at 46x GPU demand; TAG reaches ~200x at peak ​