Decathlon Switches to Polars to Optimize Data Pipelines and Infrastructure Costs
Decathlon migrated data pipelines processing small to mid-size datasets (under 50 GiB) from Apache Spark clusters to Polars running on single Kubernetes pods. The switch reduced compute launch time from 8 to 2 minutes and significantly lowered infrastructure costs. Polars' streaming engine enables processing datasets larger than available memory on modest hardware. The team now uses Polars for new pipelines with stable, smaller input tables that don't require complex joins or aggregations, while keeping Spark for terabyte-scale workloads. Challenges include managing Kubernetes infrastructure and limitations with certain Delta Lake features.