An exploration of a lightweight, open-source alternative to traditional SaaS data engineering platforms, highlighting the benefits and trade-offs of each approach.

The Scott Logic Blog offers insights, thought leadership, and technical expertise across various domains including software development, UX design, and financial services technology. Developers can explore articles on emerging technologies, industry trends, and software engineering best practices. Additionally, the blog covers case studies, project insights, and client success stories, providing  perspectives for technology professionals and enthusiasts.

Scott Logic

A comparison of lightweight open-source data engineering stacks versus SaaS platforms like Databricks and Microsoft Fabric. For small projects with a single data source and low data volumes, a self-hosted stack using Kafka, Docker, Prefect, and PostgreSQL can be cheaper and more developer-friendly. The post walks through a concrete Docker Compose setup, explains why PostgreSQL's JSONB support suits bronze-layer ingestion, and shows how Prefect's pure-Python approach enables better code structure, testability, and flexibility (e.g., swapping Spark for Dask). Trade-offs are framed as a choice between an all-inclusive holiday (SaaS) and booking everything yourself (lightweight), with guidance on when each approach makes sense.

Data Engineering on a Budget