A guide to structuring production ML pipelines on Snowflake using the `submit_directory` API, which ships an entire Python project directory to a compute pool rather than a single function. The framework uses a three-layer architecture (pipelines/source/config), a centralized YAML config file, six independently runnable pipeline stages (feature engineering, training, promotion, inference, scheduling, monitoring), and a CLI with range-based execution for CI/CD integration. Dev tooling including black, ruff, isort, and pre-commit hooks is included to make the codebase team-ready. The post is part 4 of a series evolving from notebook prototypes to a production-grade, multi-engineer ML framework on Snowflake.
Table of contents
IntroductionWhy This ApproachPrerequisitesProject StructureDesign Choice 1: Three-Layer ArchitectureDesign Choice 2: Centralised ConfigurationDesign Choice 3: submit_directoryDesign Choice 4: CLI Entrypoint with Pipeline RangesThe Pipeline StagesGet Jarry’s stories in your inboxDesign Choice 5: Session FactoryDesign Choice 6: Dev ToolingWhat Changed from Part 3Vibe Code Your Own PipelineKey TakeawaysWhat is NextSnowflake Services UsedSeries SummaryRepositorySort: