A developer building a custom dataframe library explores the theoretical foundations of dataframe operations. Starting from Petersohn et al.'s dataframe algebra (15 operators covering 200+ pandas methods), the author discovers that the relational core maps onto exactly three categorical patterns: Delta (restructuring/projection), Sigma (merging/groupby), and Pi (pairing/join). These correspond to an adjoint triple Σ ⊣ Δ ⊣ Π from Fong and Spivak's category theory work, explaining why these three operations compose correctly. The post shows how this framework guides API design in a typed Haskell dataframe library where schema transitions are verified at compile time, and how the algebraic laws enable safe query optimization like predicate pushdown and filter fusion.
Table of contents
Petersohn’s dataframe algebraThree shapes of schema changeWhy these threeDesigning an API around the three patternsWhere this is goingSort: