Best of Daily Dose of Data Science | Avi Chawla | SubstackSeptember 2024

  1. 1
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    15 DS/ML Cheat Sheets

    This post collates 15 cheat sheets covering essential data science and machine learning concepts. It includes resources on translating between different data manipulation libraries, multi-GPU training strategies, testing ML models in production, neural network optimization, and more. Detailed links are provided for further reading.

  2. 2
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    CPython vs. Cython: How to Speed-up Native Python Programs

    Learn how Cython optimizes Python's performance by converting Python code into C, resulting in significant speed improvements and reduced memory overheads. The post contrasts CPython's lack of built-in optimization with Cython's ability to restrict Python’s dynamicity through explicit data typing. The guide includes practical steps for implementing Cython in a Jupyter Notebook to achieve over 100x speedup in code execution.

  3. 3
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Accelerate Pandas 20x using FireDucks

    FireDucks is a highly optimized alternative to Pandas, boasting up to 20x performance improvements by leveraging multi-core CPU capabilities and lazy execution. With the same API as Pandas, FireDucks allows for seamless integration into existing Pandas pipelines by simply changing the import statement. The library is currently available for Linux x86_64, with versions for Windows and MacOS in development.

  4. 4
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    A Crash Course on Graph Neural Networks — Part 3

    Part 3 of the crash course on Graph Neural Networks covers advanced methods for graph learning and several feature engineering techniques, along with implementation details. The course aims to provide a beginner-friendly introduction to GNNs, highlighting their importance in big-tech ML applications and outlining the benefits and challenges of using graph data. Key topics include GNN tasks, data challenges, frameworks, advanced architectures, and practical demos.

  5. 5
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    How to Inspect Decision Trees After Training with PCA

    Decision trees often create perpendicular split conditions which can lead to overfitting, particularly with diagonal decision boundaries. Running PCA before fitting a decision tree can project data into orthogonal space, potentially reducing the tree's depth and improving performance. However, PCA components are not interpretable, which can be a limitation in some cases. Proper feature engineering might be necessary for better model performance.

  6. 6
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    15 Ways to Optimize Neural Network Training (With Implementation)

    Discover 15 techniques to optimize neural network training, complete with code examples. Understanding and applying these techniques is crucial for ML engineers to efficiently manage model training processes, save operational costs, and add genuine value. The post emphasizes the importance of identifying bottlenecks, selecting appropriate techniques, and considering trade-offs and hardware limitations.