Managing and distributing large AI model weight files at enterprise scale is a critical but often overlooked infrastructure challenge. Current approaches—Git LFS, object storage, and distributed filesystems—each have significant shortcomings. A cloud-native solution is proposed that treats model weights as first-class OCI artifacts, enabling the same delivery pipeline used for software containers: versioning, immutability, RBAC, supply chain security, and GitOps-driven deployment. The toolchain includes modctl for packaging models into OCI artifacts, Harbor for artifact registry management, and Dragonfly for P2P-based distribution achieving 70–80% bandwidth utilization across nodes. Deployment leverages Kubernetes OCI Volumes (alpha in 1.31, beta in 1.33, GA expected in 1.36) or a Model CSI Driver for older clusters, decoupling model data from inference engines like vLLM. Future work includes lazy loading, RDMA acceleration, and model security scanning.

11m read timeFrom cncf.io
Post cover image
Table of contents
Rethinking the delivery pipeline: Models deserve better than a shell scriptFutureCollaborative ProjectsReferences

Sort: