A data scientist at monday.com reflects on joining an AI agent team where there was no model to train, no Python, and no traditional ML workflow. The post argues that the data scientist's role in the agentic era shifts from model training to systematic evaluation and quality ownership. Key responsibilities include building error taxonomies from agent traces, curating golden datasets with real production failures, calibrating LLM-as-judge systems using inter-rater agreement metrics, and creating deterministic graders for structured outputs. The author introduces 'Evaluation-Driven Development' as the new feedback loop replacing model.fit(), and warns against the 'sprint velocity trap' where teams confuse shipping with improving. Context engineering is positioned as the new feature engineering, and the core data science value is framed as language-agnostic methodological rigor applied to understanding system behavior through data.

13m read timeFrom engineering.monday.com
Post cover image
Table of contents
The Empty Notebook Is the PointThe Real Problem isn’t Building Agents. It’s Knowing If They WorkEvaluation-Driven Development: Your New Training LoopWhat the Work Actually Looks LikeThe Sprint Velocity TrapWhere DS Ends and Engineering BeginsSo Is It All About Evals?The Quiet Case for Measurement

Sort: