Best of Daily Dose of Data Science | Avi Chawla | SubstackJuly 2024

  1. 1
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    9 Python Command Line Flags

    Discover the 9 most common Python command line flags and how they modify the behavior of the Python interpreter. This includes flags like `-c` for running commands directly in the command line, `-i` for entering interactive mode after script execution, and `-O` and `-OO` for optimizing code by ignoring assert statements and docstrings. Additional flags like `-W` for ignoring warnings, `-m` for running modules as scripts, `-v` for verbose mode, `-x` for skipping the first line of a script, and `-E` for ignoring Python environment variables are also covered.

  2. 2
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    GROUPING SETS in SQL

    Learn how to efficiently run multiple aggregations in SQL using GROUPING SETS, which allows scanning the table just once. This method is more efficient compared to using UNION with separate queries. The post provides a detailed example and a link to a Jupyter Notebook for practical implementation.

  3. 3
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Free Daily Dose of Data Science Archive

    The 2024 edition of the Daily Dose of Data Science archive has been released. It features categorized posts on key data science and machine learning topics, a 2-minute assessment to recommend relevant chapters, and a focus on practical, no-fluff content. The edition aims to maximize learning efficiency and enhance readers' skills significantly.

  4. 4
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Improve Matplotlib Plot Quality

    Matplotlib plots in Jupyter Notebook can appear dull and blurry when scaled. A useful hack is to render plots as SVGs (Scalable Vector Graphics) instead of the default image format. This ensures high-quality plots that remain sharp even when zoomed. Use either `from matplotlib_inline.backend_inline import set_matplotlib_formats` with `set_matplotlib_formats('svg')` or `%config InlineBackend.figure_format = 'svg'` to implement this improvement in your notebooks.

  5. 5
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Automated EDA Tool Stack

    Discover eight powerful automated EDA (Exploratory Data Analysis) tools, including SweetViz, ydata-profiling, DataPrep, AutoViz, D-Tale, dabl, QuickDA, and Lux. These tools help automate repetitive EDA tasks such as plotting response variables, checking imbalance, running correlation analysis, and missing value analysis, thereby reducing human errors and providing standardized reports across projects. Each tool offers unique features and integrates with common data science environments like Jupyter Notebook.

  6. 6
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    5 Cross Validation Techniques Explained Visually

    Cross validation is essential for accurate machine learning model evaluation, avoiding overly optimistic results from a single validation set. This guide covers five key techniques: Leave-One-Out, K-Fold, Rolling, Blocked, and Stratified Cross Validation, each offering different advantages for various data structures and needs. Stay up to date and improve your skills with these robust validation strategies.