Best of Daily Dose of Data Science | Avi Chawla | SubstackAugust 2024

  1. 1
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    A Crash Course on Graph Neural Networks

    Graph Neural Networks (GNNs) extend deep learning techniques to graph data, addressing the limitations of traditional models in capturing complex relationships. This piece covers the basics, benefits, tasks, data challenges, frameworks, and practical implementation of GNNs.

  2. 2
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    A Simple Implementation of Boosting Algorithm

    Boosting is a machine learning technique where each successive model attempts to correct the errors of its predecessor, leading to improved performance. Key design choices include tree construction, loss function, and weighting of each tree's contribution. A step-by-step example using the Sklearn decision tree regressor shows how boosting works and the incremental improvement in R2 scores. Boosting algorithms are particularly significant for tabular data in machine learning.

  3. 3
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    10 Regression and Classification Loss Functions

    This post highlights the most commonly used loss functions in regression and classification tasks. It covers Mean Bias Error, Mean Absolute Error, Mean Squared Error, Root Mean Squared Error, Huber Loss, and Log Cosh Loss for regression. For classification, it discusses Binary Cross Entropy, Hinge Loss, Cross-Entropy Loss, and KL Divergence. Each loss function is briefly explained along with its pros and cons.

  4. 4
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Why Join() Is Faster Than Iteration?

    Using Python’s join() method for string concatenation is significantly faster than iterating and appending strings. This is because join() can allocate memory in a single call by knowing the number of strings and spaces beforehand, whereas iteration requires repeated memory allocations for each element and separator. This optimization improves runtime and memory utilization.

  5. 5
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Use SQL "NOT IN" With Caution

    Using SQL's NOT IN clause can cause unexpected errors when the subquery contains NULL values. This happens because the NOT IN clause evaluates conditions using the AND operator, leading to UNKNOWN results when NULLs are present. To avoid this issue, filter NULL values out in the subquery or use Anti Joins as alternatives.

  6. 6
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    The Evolution of Embeddings

    The post discusses the evolution of embeddings in natural language processing. It explores the shift from static embeddings like Glove and Word2Vec to contextualized embeddings powered by Transformer models such as BERT, DistilBERT, and ALBERT. The latter can generate context-aware representations, addressing limitations where a word's meaning changes based on context. Examples and comparisons illustrate how these models capture word semantics and syntactics more effectively.