Best of Pandas2024

  1. 1
    Video
    Avatar of TechWithTimTech With Tim·2y

    Master Python With This ONE Project!

    This post guides you through building a personal finance tracker in Python, covering syntax, advanced features, and popular modules like Pandas and Matplotlib. The project involves tracking and logging transactions, organizing data, generating summaries of income and expenses, and visualizing the data with graphs. It also explains how to use CSV files for data storage and offers a quick demo followed by step-by-step instructions for implementation.

  2. 2
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·1y

    Pandas vs. FireDucks Performance Comparison

    FireDucks is a highly optimized alternative to Pandas, boasting a significant speed improvement through lazy execution. Users only need to replace their Pandas import with FireDucks. Benchmarks show FireDucks outperforming Pandas and other libraries like Modin and Polars, particularly in its speedy performance. The post provides instructions for installing FireDucks, using it in Jupyter Notebook, and integrating it into existing Python scripts.

  3. 3
    Article
    Avatar of kdnuggetsKDnuggets·2y

    How to Speed Up Python Pandas by Over 300x

    Pandas is a popular open-source data manipulation and analysis library for Python, widely used in various fields. To speed up data analysis by over 300x, vectorization can be applied. This method uses entire arrays of data at once, instead of processing each element individually, thus optimizing memory and CPU resource usage. Compared to looping and the apply method, vectorization is significantly faster. Examples demonstrate how dataset calculations that took 3.66 seconds using loops can be reduced to just 10.4 milliseconds using vectorization.

  4. 4
    Article
    Avatar of kdnuggetsKDnuggets·2y

    Building Data Science Pipelines Using Pandas

    Learn to build end-to-end data science pipelines using the Pandas pipe method. This method enhances code readability, enables function chaining, and improves code organization. The tutorial includes transforming code into a pipeline structure that handles data ingestion, cleaning, analysis, and visualization, demonstrating a comparison between pipeline and non-pipeline approaches.

  5. 5
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    6 Elegant Jupyter Hacks

    Discover 6 elegant Jupyter hacks to improve your experience. Learn how to retrieve a cell's output, enrich the default preview of a DataFrame, generate helpful hints as you write Pandas code, improve rendering of DataFrames, restart the Jupyter kernel without losing variables, and search code in all Jupyter Notebooks from the terminal.

  6. 6
    Article
    Avatar of mlmMachine Learning Mastery·2y

    10 Python One-Liners That Will Boost Your Data Science Workflow

    Python offers versatile one-liners to enhance your data science workflow. Learn efficient methods to handle missing data, remove highly correlated features, apply conditional columns, find common and different elements, and use Boolean masks for filtering. Other techniques include counting occurrences in lists, extracting numbers from text, flattening nested lists, converting lists to dictionaries, and merging dictionaries efficiently.

  7. 7
    Article
    Avatar of mlmMachine Learning Mastery·2y

    Automating Data Cleaning Processes with Pandas

    Discover how to automate data cleaning processes using the Pandas library. Learn about typical data cleaning functions like filling missing values, removing duplicates, manipulating strings, and converting date formats. The post also introduces a custom class, DataCleaner, to encapsulate these steps into a reusable pipeline for an efficient and systematic approach to data cleaning.

  8. 8
    Article
    Avatar of medium_jsMedium·2y

    Data Validation with Pandera in Python

    Pandera is a Python library designed to validate dataframe-like objects in production ML pipelines. It supports various dataframe libraries including pandas, polars, dask, modin, and pyspark.pandas. Users can define schemas and models to enforce column types and properties, set custom validations, and use configurations like strict, coerce, and lazy validation to streamline data processing. Integrating Pandera in ML pipelines helps ensure data quality and prevents processing errors, offering robust data checks and handling invalid rows efficiently.

  9. 9
    Article
    Avatar of hnHacker News·2y

    Spreadsheet UI for Python

    PySheets provides a spreadsheet UI for Python, allowing users to perform exploratory data science, use Pandas, create charts with matplotlib, import Excel sheets, analyze data, and create reports. All the Python code runs in the browser, and PySheets itself is also written in Python. Collaboration, unlimited sheets, community support, and unlimited AI generations.

  10. 10
    Article
    Avatar of kdnuggetsKDnuggets·2y

    Utilizing Pandas AI for Data Analysis

    Learn how to utilize Pandas AI for data analysis, including setup, data exploration, data visualization, and advanced usage.

  11. 11
    Article
    Avatar of medium_jsMedium·2y

    High-Performance Python Data Processing: pandas 2 vs. Polars, a vCPU Perspective

    Polars is emerging as a strong competitor to pandas for Python data analysis, boasting significant performance improvements due to its Rust backend optimized for parallel processing and vectorized operations. This post tests Polars against pandas with varying vCores, finding Polars generally faster, though it encounters some challenges with single vCore setups. While Polars shows great promise, considerations like cost, compatibility, and maturity remain important when evaluating a switch from pandas.

  12. 12
    Article
    Avatar of jetbrainsJetBrains·2y

    How to Move From pandas to Polars

    Polars is gaining popularity in the data science community due to its speed and security benefits, being written in Rust and based on Apache Arrow. Polars offers a similar API to pandas, which lowers the barrier for migration. It handles large data sets more efficiently with its lazy API and better concurrency capabilities. Tools like PyCharm support Polars, smoothing the transition. The primary differences in syntax and migration tips are provided, ensuring a relatively seamless switch from pandas to Polars.

  13. 13
    Article
    Avatar of kdnuggetsKDnuggets·2y

    Mastering Python for Data Science: Beyond the Basics

    Learn advanced Python techniques for data science, including efficient data manipulation with Pandas, high-performance computing with NumPy, and leveraging niche libraries for elevated data analysis.

  14. 14
    Article
    Avatar of mlmMachine Learning Mastery·2y

    Beginning Data Science (7-day mini-course)

    This post provides a 7-day mini-course on beginning data science. It covers topics such as tools in data science, target audience, and the lessons covered in the course.

  15. 15
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Accelerate Pandas 20x using FireDucks

    FireDucks is a highly optimized alternative to Pandas, boasting up to 20x performance improvements by leveraging multi-core CPU capabilities and lazy execution. With the same API as Pandas, FireDucks allows for seamless integration into existing Pandas pipelines by simply changing the import statement. The library is currently available for Linux x86_64, with versions for Windows and MacOS in development.

  16. 16
    Article
    Avatar of jetbrainsJetBrains·2y

    Polars vs. pandas: What’s the Difference?

    Polars is a powerful dataframe library built for speed and efficiency on a single machine, often outperforming pandas in memory usage and speed. Written in Rust and based on Apache Arrow, Polars offers features like safe concurrency and query optimization through lazy execution. Despite its performance advantages, Polars is less compatible with current data visualization and machine learning libraries compared to pandas.

  17. 17
    Article
    Avatar of planetpythonPlanet Python·2y

    Displaying Pandas DataFrames in the Terminal

    Learn how to use the textual-pandas package to display pandas DataFrames directly in your terminal with ease. The guide covers installation using pip and provides sample code to help you quickly get started with creating a Textual application that loads and displays a DataFrame in a table widget.

  18. 18
    Article
    Avatar of kdnuggetsKDnuggets·2y

    How to Perform Memory-Efficient Operations on Large Datasets with Pandas

    Learn effective techniques to handle and perform memory-efficient operations on large datasets using Pandas. Tips include using the `low_memory` parameter when loading data, converting data types, processing data in chunks, and employing vectorized operations instead of `apply` with lambda functions. Additional suggestions include using `inplace=True` for DataFrame modifications and filtering data before performing operations.

  19. 19
    Article
    Avatar of jetbrainsJetBrains·2y

    Data Exploration With pandas

    Learn how to explore and understand data using pandas in PyCharm by leveraging summary statistics and graphical plots. Discover how to distinguish between continuous and categorical variables, generate summary statistics, and visualize data using histograms, box plots, bar charts, and scatter plots. Utilize JetBrains AI Assistant to generate relevant code snippets and enhance your data analysis workflow.

  20. 20
    Article
    Avatar of taiTowards AI·2y

    This Pandas Trick Will Blow Your Mind As a Data Scientist!

    Learn how to automate data analysis with Pandas through an 8-step process. The guide covers setting up your environment, uploading CSV files, and generating comprehensive reports with just one click. Essential libraries include Pandas, Numpy, Ipywidgets, Matplotlib, and Seaborn.

  21. 21
    Article
    Avatar of planetpythonPlanet Python·2y

    Using Pandas to Read JSON from URL

    Learn how to use Pandas in Python to read JSON data directly from a URL into a DataFrame. This tutorial covers a basic example and explains the key parameters of the `pd.read_json()` method, enabling customization of the data reading process.

  22. 22
    Article
    Avatar of tigerdataTigerData (Creators of TimescaleDB)·2y

    Guide to Time-Series Analysis in Python

    Learn how Python can be used for time-series analysis, including loading and analyzing time-series data, plotting with Pyplot, and handling challenges of working with large datasets. Explore the advantages of Python, such as its simplicity, extensive library support, and code reusability. Discover specialized libraries like pandas, Matplotlib, tsfresh, and more for advanced time-series tasks. Gain insights into data cleaning, trend analysis, seasonality detection, forecasting models, and feature extraction for machine learning or deep learning algorithms.

  23. 23
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·1y

    How to Create a Calendar Plot in Python?

    Calendar plots are an effective way to visualize day-to-day variations in data over a longer period, typically a year. Using the `calplot()` method from the Plotly library, you can easily create a calendar plot with just two lines of Python code. This plot type is particularly useful for detecting weekly or monthly seasonality in data and can often reveal insights that traditional plots or aggregation methods might miss.

  24. 24
    Article
    Avatar of detlifeData Engineer Things·2y

    Excel Isn’t Going Anywhere, So Let’s Automate Parsing It

    Automating Excel file parsing with Python and Pandas can significantly improve efficiency, consistency, and scalability in handling messy, manually filled Excel files. This guide provides a step-by-step process to read and extract specific table data, handle issues, and alert stakeholders about any problems encountered during parsing.

  25. 25
    Article
    Avatar of freecodecampfreeCodeCamp·2y

    How to Build a Production-Grade Movie Recommender in Python – A Machine Learning Handbook

    Learn how to build a movie recommendation system in Python using pandas, machine learning algorithms, and CountVectorizer for text pre-processing. The system analyzes movie descriptions, leverages cosine similarity to find similar movies, and provides personalized recommendations based on user preferences.