The Revenge of the Data Scientist – Hamel’s Blog

Data scientists are not obsolete in the LLM era — their core skills are more critical than ever. The post argues that most AI engineering teams are failing at evaluation fundamentals that data scientists have long mastered. Five recurring eval pitfalls are covered: using generic off-the-shelf metrics instead of application-specific ones, trusting unverified LLM judges without treating them as classifiers, poor experimental design (synthetic test sets not grounded in real data, Likert scales instead of binary pass/fail), bad data and label practices, and over-automating work that requires human judgment. Each pitfall maps directly to a data science fundamental: EDA, model evaluation, experimental design, data collection, and production ML monitoring. The core message is that calling an LLM API doesn't eliminate the need for rigorous data examination, hypothesis-driven metrics, and skeptical validation — it just changes the surface where that work happens.

#machine-learning

#data-science

#llm

Apr 02•9m read time•From hamel.dev

Table of contents

The Harness Is Data Science Generic Metrics Unverified Judges Bad Experimental Design Bad Data and Labels Automating Too Much Other Pitfalls The Mapping Video & Slides Footnotes

Comment

Bookmark

Copy

Sort: