Apache Spark committer Holden Karau discusses distributed data processing, the evolution from MapReduce to Spark, and how Spark handles modern ML workloads with GPUs. The conversation covers common mistakes in distributed computing (like ignoring data skew), resource profiles for GPU optimization, and the interplay between data

41m watch time

Sort: