The full RL nanodegree, covered with implementation.

Daily Dose of DS offers a daily dose of inspiration, education, and motivation for data scientists and aspiring data professionals. Through bite-sized articles, tutorials, and curated resources, readers embark on a journey to master the art and science of data analysis, machine learning, and artificial intelligence. By staying updated with the latest trends, techniques, and tools in data science, readers can hone their skills and stay ahead in this rapidly evolving field.

Daily Dose of Data Science | Avi Chawla | Substack

Part 2 of a hands-on reinforcement learning course series covering the formal foundations every RL algorithm is built on. Topics include the Markov property, MDPs as a 5-tuple, episodic vs. continuing tasks, returns and discounting, the reward hypothesis and reward hacking, deterministic and stochastic policies, state-value functions, and a complete Monte Carlo policy evaluation implementation on a 4×4 gridworld. The series contextualizes RL's growing relevance through its use in LLM post-training pipelines (RLHF, GRPO, constitutional AI) and agentic AI systems.

Markov Decision Processes and Value Functions in RL