Best of PandasNovember 2024

  1. 1
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·1y

    Pandas vs. FireDucks Performance Comparison

    FireDucks is a highly optimized alternative to Pandas, boasting a significant speed improvement through lazy execution. Users only need to replace their Pandas import with FireDucks. Benchmarks show FireDucks outperforming Pandas and other libraries like Modin and Polars, particularly in its speedy performance. The post provides instructions for installing FireDucks, using it in Jupyter Notebook, and integrating it into existing Python scripts.

  2. 2
    Article
    Avatar of medium_jsMedium·2y

    Data Validation with Pandera in Python

    Pandera is a Python library designed to validate dataframe-like objects in production ML pipelines. It supports various dataframe libraries including pandas, polars, dask, modin, and pyspark.pandas. Users can define schemas and models to enforce column types and properties, set custom validations, and use configurations like strict, coerce, and lazy validation to streamline data processing. Integrating Pandera in ML pipelines helps ensure data quality and prevents processing errors, offering robust data checks and handling invalid rows efficiently.

  3. 3
    Article
    Avatar of taiTowards AI·2y

    This Pandas Trick Will Blow Your Mind As a Data Scientist!

    Learn how to automate data analysis with Pandas through an 8-step process. The guide covers setting up your environment, uploading CSV files, and generating comprehensive reports with just one click. Essential libraries include Pandas, Numpy, Ipywidgets, Matplotlib, and Seaborn.

  4. 4
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·1y

    How to Create a Calendar Plot in Python?

    Calendar plots are an effective way to visualize day-to-day variations in data over a longer period, typically a year. Using the `calplot()` method from the Plotly library, you can easily create a calendar plot with just two lines of Python code. This plot type is particularly useful for detecting weekly or monthly seasonality in data and can often reveal insights that traditional plots or aggregation methods might miss.