The Snowflake ML Framework That Ships Itself — Production ML with submit_directory

A guide to structuring production ML pipelines on Snowflake using the `submit_directory` API, which ships an entire Python project directory to a compute pool rather than a single function. The framework uses a three-layer architecture (pipelines/source/config), a centralized YAML config file, six independently runnable pipeline stages (feature engineering, training, promotion, inference, scheduling, monitoring), and a CLI with range-based execution for CI/CD integration. Dev tooling including black, ruff, isort, and pre-commit hooks is included to make the codebase team-ready. The post is part 4 of a series evolving from notebook prototypes to a production-grade, multi-engineer ML framework on Snowflake.

#machine-learning

#python

#snowflake

#mlops

#xgboost

Apr 16•17m read time•From medium.com

Table of contents

Introduction Why This Approach Prerequisites Project Structure Design Choice 1: Three-Layer Architecture Design Choice 2: Centralised Configuration Design Choice 3: submit_directory Design Choice 4: CLI Entrypoint with Pipeline Ranges The Pipeline Stages Get Jarry’s stories in your inbox Design Choice 5: Session Factory Design Choice 6: Dev Tooling What Changed from Part 3 Vibe Code Your Own Pipeline Key Takeaways What is Next Snowflake Services Used Series Summary Repository

Comment

Bookmark

Copy

Sort: