..and an open-source framework that's making it happen.

Daily Dose of DS offers a daily dose of inspiration, education, and motivation for data scientists and aspiring data professionals. Through bite-sized articles, tutorials, and curated resources, readers embark on a journey to master the art and science of data analysis, machine learning, and artificial intelligence. By staying updated with the latest trends, techniques, and tools in data science, readers can hone their skills and stay ahead in this rapidly evolving field.

Daily Dose of Data Science | Avi Chawla | Substack

Andrej Karpathy's critique of scalar reward functions in RL is being addressed by RULER, implemented in the open-source OpenPipe ART framework. RULER lets developers define reward criteria in plain English, using an LLM to evaluate agent trajectories instead of hand-coded scoring functions. This mirrors the evolution from RLHF to GRPO, and now to natural language rewards — effectively turning RL reward engineering into prompt engineering. A demo trains a Qwen3 1.4B agent to play 2048 using this approach.

Karpathy’s Prediction About RL is Coming True Now!

Karpathy’s prediction about RL is coming true now!