Best of StatisticsJune 2024

  1. 1
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Poisson Regression vs. Linear Regression

    Linear regression may not be suitable for count data as it can produce negative predictions, which don't make sense for certain types of data like the number of calls received. Poisson regression, a type of generalized linear model (GLM), is better suited for count-based responses as it assumes the data follows a Poisson distribution. It ensures non-negative predictions and acknowledges that outcomes are not equally likely around the mean.

  2. 2
    Article
    Avatar of tdsTowards Data Science·2y

    From Code to Insights: Software Engineering Best Practices for Data Analysts

    This post provides software engineering best practices for data analysts. It covers key lessons, such as code readability, automation of repetitive tasks, mastering tools, managing environments, optimizing program performance, DRY principle, leveraging testing, using version control systems, seeking code reviews, and staying up-to-date.

  3. 3
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Why is OLS Called an Unbiased Estimator?

    The post explains why the OLS (Ordinary Least Squares) estimator in linear regression is considered an unbiased estimator. It details the concept of unbiasedness, showing that the expected value of the OLS parameter estimates, when computed over many samples, equals the true population parameter. An important takeaway is not to confuse unbiasedness with always obtaining the true parameter value from a single sample; rather, it means that the average estimate over multiple samples will equal the true parameter.

  4. 4
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·2y

    Even Two Outliers Can Distort Your Data Analysis

    Outliers can significantly distort the results of data analysis, such as correlation and regression fits, leading to misleading conclusions. Visualizing data through plots like PairPlot is crucial to identify these outliers and validate statistical measures. Manual code reviews are often inefficient, but tools like Sourcery leverage AI to provide instant, human-like code reviews, significantly speeding up the process.