Best of Daily Dose of Data Science | Avi Chawla | Substack — July 2025

1
Article
Daily Dose of Data Science | Avi Chawla | Substack·38w
4 Stages of Training LLMs from Scratch
Training large language models from scratch involves four key stages: pre-training on massive text corpora to learn language basics, instruction fine-tuning to make models conversational and follow commands, preference fine-tuning using human feedback (RLHF) to align with human preferences, and reasoning fine-tuning for mathematical and logical tasks using correctness as a reward signal. Each stage builds upon the previous one to create increasingly capable and aligned AI systems.
104
2
2
Article
Daily Dose of Data Science | Avi Chawla | Substack·40w
MCP Integration with 4 Popular Agentic Frameworks
Part 8 of an MCP crash course demonstrates how to integrate Model Context Protocol with four popular agentic frameworks: LangGraph, CrewAI, LlamaIndex, and PydanticAI. The tutorial provides step-by-step practical walkthroughs for connecting MCP to each framework, along with detailed implementations. This builds on previous parts covering MCP fundamentals, custom client development, tools/resources/prompts, sampling integration, and security considerations including testing and sandboxing.
67
3
Article
Daily Dose of Data Science | Avi Chawla | Substack·38w
What is Context Engineering?
Context engineering is emerging as a critical skill for AI engineers, focusing on systematically orchestrating context rather than just clever prompting. Unlike traditional prompt engineering that relies on 'magic words', context engineering creates dynamic systems that provide the right information, tools, and format to LLMs. The approach addresses the real bottleneck in AI applications - not model capability, but setting up proper information architecture. Key components include dynamic information flow, smart tool access, memory management (both short-term and long-term), and format optimization. As AI models improve, context quality becomes the limiting factor for application success.
62
4
Article
Daily Dose of Data Science | Avi Chawla | Substack·39w
Prompting vs. RAG vs. Finetuning
A decision framework for choosing between prompt engineering, RAG, and fine-tuning when building LLM applications. The choice depends on two key factors: the amount of external knowledge required and the level of model adaptation needed. RAG works best for custom knowledge bases without behavior changes, fine-tuning modifies model structure and behavior, prompt engineering suffices for basic adjustments, and hybrid approaches combine RAG with fine-tuning for complex requirements.
44
5
Article
Daily Dose of Data Science | Avi Chawla | Substack·41w
6 No-code LLM, Agents, and RAG Builder Tools for AI Engineers
Six open-source no-code tools enable AI engineers to build LLM applications, agents, and RAG systems without extensive programming. Featured tools include RAGFlow for document understanding, Langflow for visual agent building, LLaMA-Factory for model fine-tuning, Transformer Lab for local LLM experimentation, xpander for agent backends, and AutoAgent for natural language agent creation. These platforms collectively have over 200k GitHub stars and support various AI development workflows from training to deployment.
40

See all Daily Dose of Data Science | Avi Chawla | Substack archives