Pinterest reduced out-of-memory errors in Apache Spark by 96% through Auto Memory Retries, a feature that automatically identifies memory-intensive tasks and retries them on larger executors. The system uses a hybrid approach: first doubling CPU allocation per task to give it more memory on existing executors, then launching
Table of contents
Spark PlatformProblem IdentificationImplementationGet Pinterest Engineering’s stories in your inboxRollout & MonitoringResultsLearningsFutureConclusionAcknowledgementsReferencesSort: