On-call week in Data Engineering. Issues worked in last five days in my job last week. I cover the issues that came during this week and their solution. In this blog, I cover what happened in Spark and the solution for pandas to pandas.
Table of contents
Pandas to pandas_on_sparkClass not found exception (Spark serialization error)PyArrow compatibility issueInsert Overwrite on the same partitionAmbiguous columnsJob Failure during writeNull Value in spark query, but data present in Hive tableSpark local dir full issueThe broadcast hint not workingUnable to work with PySpark with python 3.8 in LivyNull Pointer Exception in PySparkExecutor dying due to OOMSort: