Explore how the Informatica data integration framework moved from a solitary node setup into a scalable Spark environment on Kubernetes.

The Salesforce Engineering Blog offers a deep dive into Salesforce technologies, providing technical insights, best practices, and real-world examples. Developers can explore topics such as Salesforce development, architecture, and integrations, gaining  knowledge to build innovative solutions on the Salesforce platform. With contributions from Salesforce engineers and industry experts, the blog serves as a resource for the Salesforce community.

Salesforce Engineering

Shivangi Srivastava, Senior Director at Salesforce/Informatica, explains how Cloud Data Integration (CDI) evolved from a single-node engine to a distributed Spark-on-Kubernetes platform serving 5,500 enterprise customers running 250,000 daily pipelines. Key engineering decisions include extending open-source Spark into 'Spark++' for enterprise features like lineage tracking, preserving backward compatibility for existing graphical pipeline abstractions, and implementing a FinOps automation layer with three components (Cluster Lifecycle Manager, Cluster Tuner, Job Tuner) that reduces infrastructure costs by ~1.65x. The architecture separates the control plane from the data plane to maintain 99.9% availability during compute spikes.

Informatica’s Data Integration Platform: Running 250K Enterprise Pipelines Daily