The full RL nanodegree, covered with implementation.

Daily Dose of DS offers a daily dose of inspiration, education, and motivation for data scientists and aspiring data professionals. Through bite-sized articles, tutorials, and curated resources, readers embark on a journey to master the art and science of data analysis, machine learning, and artificial intelligence. By staying updated with the latest trends, techniques, and tools in data science, readers can hone their skills and stay ahead in this rapidly evolving field.

Daily Dose of Data Science | Avi Chawla | Substack

Part 4 of an RL course series covering model-free learning methods. Topics include Monte Carlo prediction and control, temporal-difference learning, the bias-variance tradeoff between MC and DP, SARSA, Q-learning, and maximization bias. A hands-on experiment comparing SARSA vs. Q-learning on the Cliff Walking gridworld with code is included. No prior RL background required. The post also motivates RL's relevance to modern AI problems like LLM token generation, agentic pipelines, and RLHF/GRPO-based post-training.

Model-Free Learning in RL