Salesforce's Data 360 team orchestrates nearly 2 million Spark applications daily on Kubernetes. The default kube-scheduler's LeastAllocated strategy caused node fragmentation by spreading Spark executors across many nodes, leaving idle capacity that Karpenter's reactive consolidation couldn't reclaim without disrupting running

5m read timeFrom engineering.salesforce.com
Post cover image

Sort: