Pinterest reduced out-of-memory errors in Apache Spark by 96% through Auto Memory Retries, a feature that automatically identifies memory-intensive tasks and retries them on larger executors. The system uses a hybrid approach: first doubling CPU allocation per task to give it more memory on existing executors, then launching

15m read time From medium.com
Post cover image
Table of contents
Spark PlatformProblem IdentificationImplementationGet Pinterest Engineering’s stories in your inboxRollout & MonitoringResultsLearningsFutureConclusionAcknowledgementsReferences

Sort: