I came across a post recently by a machine learning engineer who made the bold claim that logistic regression is the worst name for an algorithm ever, or something along those lines1. Many statisticians of the more old-school type seemed to disagree. T...

R-bloggers

A statistician argues that logistic regression is correctly named as a regression model, not a classification model. The key insight is that regression should be defined as predicting E[Y|X] — the conditional expectation — rather than by whether the response variable is numeric or categorical. Since logistic regression predicts P(Y=1|X), which equals E[Y|X] for binary outcomes, it is genuinely a regression. Classification only occurs when a threshold is applied on top of those predicted probabilities. The post also critiques the data science convention of defining regression vs. classification purely by the type of response variable, noting that an overfitted decision tree predicting numeric values shows no regression-to-the-mean effect and arguably shouldn't be called a regression at all. R code examples illustrate both logistic regression and the regression-to-the-mean phenomenon.

Is logistic regression regression?