Best of Daily Dose of Data Science | Avi Chawla | Substack — 2025

1
Article
Daily Dose of Data Science | Avi Chawla | Substack·44w
9 MCP Projects for AI Engineers
A comprehensive collection of 9 Model Control Protocol (MCP) projects designed for AI engineers, covering various applications from local MCP clients and agentic RAG systems to voice agents and synthetic data generators. The projects demonstrate how to integrate MCP with popular tools like Claude Desktop and Cursor IDE, enabling developers to build more sophisticated AI applications with enhanced tool connectivity and context sharing capabilities.
611
12
2
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
10 MCP, AI Agents, and RAG projects for AI Engineers
Explore 10 AI-focused projects including building an MCP-powered Agentic RAG, a multi-agent book writer, and a RAG system that understands audio content. Learn how to build and fine-tune AI models like DeepSeek-R1 and create applications using open-source tools like Llama 4 and Colpali.
473
3
3
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
9 RAG, LLM, and AI Agent Cheat Sheets
This post provides visual cheat sheets for AI engineers covering various topics, including Transformer vs. Mixture of Experts in LLMs, fine-tuning techniques, RAG vs Agentic RAG, strategies for chunking in RAG, levels of agentic AI systems, and more. These resources are designed to help cultivate essential skills for developing impactful AI and ML systems in the industry.
334
1
4
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
5 Agentic AI Design Patterns
Explore five agentic AI design patterns that enhance the effectiveness of AI agents through reflection, tool use, reason and act, planning, and multi-agent approaches. Learn how Firecrawl Extract facilitates web scraping by using simple English prompts to extract clean, structured data. Discover additional resources on machine learning techniques and data science provided by Daily Dose of Data Science.
294
5
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
AI Agent Crash Course—Part 1
In this crash course, learn about AI agents and their implementation. It covers the fundamentals, memory for agents, agentic flows, guardrails, implementing agentic design patterns, and optimizing agents for production. The aim is to build autonomous systems that can reason, plan, take actions, and correct themselves, going beyond the capabilities of standalone generative models.
199
1
6
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
Pandas Mind Map
A detailed mind map of various Pandas methods categorized by their operation types, including I/O methods, DataFrame creation, statistical information, renaming, plotting, time-series, grouping, pivot, and categorical data methods. Additional ML resources and techniques are also provided for developing industry-relevant skills.
195
5
7
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
16 Techniques to Build Real-world RAG Systems
Scaling a prototype RAG system for real-world use presents significant challenges, such as performance bottlenecks and inefficient retrieval. This guide offers 16 practical techniques to help developers overcome these issues across five key pillars. It also highlights five agentic AI design patterns, including reflection, tool use, ReAct, planning, and multi-agent patterns, which enable LLMs to refine outputs, gather information, and subdivide tasks more effectively.
188
8
Article
Daily Dose of Data Science | Avi Chawla | Substack·46w
The Full MCP Blueprint
MCP (Model Context Protocol) provides a standardized way for LLMs to interact with tools and capabilities, solving the M×N integration problem where every tool needs manual connection to every model. The protocol enables dynamic tool discovery, plug-and-play interoperability between systems like Claude and Cursor, and transforms AI development from prompt engineering to systems engineering. MCP uses a Host-Client-Server architecture with JSON-RPC communication and supports various transport mechanisms including Stdio and HTTP.
187
9
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
Building a Real-time Voice RAG Agent
Real-time voice interactions are becoming increasingly popular. This post provides a detailed, step-by-step guide on building a real-time Voice RAG Agent. Key components include using AssemblyAI for speech-to-text transcription, LlamaIndex for document-based answers, and Cartesia for generating seamless speech. The post includes a video and open-source code for easy implementation.
174
1
10
Article
Daily Dose of Data Science | Avi Chawla | Substack·35w
8 RAG Architectures for AI Engineers
Eight different RAG (Retrieval-Augmented Generation) architectures are explained with their specific use cases: Simple Vector RAG for basic semantic matching, Multi-modal RAG for cross-modal retrieval, HyDE for handling dissimilar queries, Self-RAG for validation against trusted sources, Graph RAG for structured relationships, Hybrid RAG combining vector and graph approaches, Adaptive RAG for dynamic query handling, and Agentic RAG for complex workflows with AI agents.
172
2
11
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
12 Powerful Tools For AI Agents
A comprehensive guide listing 12 powerful tools included in the CrewAI framework for building AI agents. The tools range from file reading and writing, code interpreting, and web scraping to advanced functionalities like RAG-powered searches and natural language to SQL conversion. Additionally, the post highlights a full crash course on AI agents, covering everything from fundamentals to production optimization.
162
12
Article
Daily Dose of Data Science | Avi Chawla | Substack·28w
A 100% Open-source Alternative to n8n!
Sim is an open-source drag-and-drop platform for building agentic workflows that runs locally with any LLM. The article demonstrates building a finance assistant connected to Telegram using agents, MCP servers, and APIs. It also covers four RAG indexing strategies: chunk indexing (splitting documents into embedded chunks), sub-chunk indexing (breaking chunks into finer pieces while retrieving larger context), query indexing (generating hypothetical questions for better semantic matching), and summary indexing (using LLM-generated summaries for dense data).
156
7
13
Article
Daily Dose of Data Science | Avi Chawla | Substack·50w
Build an MCP Server in 3 Steps
This post describes a simple three-step process to build an MCP server using tools like Gitingest and Google AI Studio, enabling the transformation of FastMCP repository data into LLM-readable text. It also highlights the capabilities of the Firecrawl framework, which converts websites into structured formats for AI applications.
152
14
Article
Daily Dose of Data Science | Avi Chawla | Substack·45w
48 Most Popular Open ML Datasets
A comprehensive compilation of 48 widely-used open machine learning datasets organized by domain including computer vision (ImageNet, COCO), natural language processing (SQuAD, GLUE), recommendation systems (MovieLens, new Yambda-5B), tabular data (UCI datasets, Titanic), reinforcement learning (OpenAI Gym), and multimodal learning (LAION-5B, VQA). Each dataset is briefly described with its primary use case and key characteristics, serving as a reference guide for researchers and practitioners selecting appropriate datasets for their ML projects.
140
1
15
Article
Daily Dose of Data Science | Avi Chawla | Substack·30w
The Open-source RAG Stack
A comprehensive guide to building production-ready RAG systems using open-source tools. Covers the complete technology stack from frontend frameworks to data ingestion, including LLM orchestration tools like LangChain and CrewAI, vector databases like Milvus and Chroma, embedding models, and retrieval systems. Also showcases 9 practical MCP (Model Context Protocol) projects for AI engineers, ranging from local MCP clients to voice agents and financial analysts.
135
16
Article
Daily Dose of Data Science | Avi Chawla | Substack·48w
Building a 100% local MCP Client
Learn how to build a completely local Model Context Protocol (MCP) client using tools like LlamaIndex, Ollama, and LightningAI. The tutorial provides a comprehensive walkthrough to create an MCP client capable of communicating with external tools and data sources through a structured protocol. It demonstrates setting up an SQLite server and building an AI agent using Deepseek-R1 as the local LLM, providing users with context-aware responses based on their queries.
133
1
17
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
25 Most Important Mathematical Definitions in DS
A visual presentation of crucial mathematical definitions used in Data Science and Statistics, such as Gradient Descent, Normal Distribution, MLE, Z-score, and SVD. The post explains these terms and their significance in various applications like dimensionality reduction, optimization, and data modeling.
131
18
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
5 Levels of Agentic AI Systems
Agentic AI systems are capable of making decisions, calling functions, and running autonomous workflows. The levels of AI agency include basic responders, router patterns, tool calling, multi-agent patterns, and fully autonomous patterns. Each level indicates a different degree of independence and capability of the AI system.
129
19
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
5 Powerful MCP Servers
The post details five powerful MCP servers which enhance AI agents' capabilities. These servers include Firecrawl for web scraping, Browserbase for initiating browser sessions, Opik for monitoring LLM applications, Brave MCP server for utilizing Brave Search, and Sequential thinking for problem-solving through structured thinking processes. Additionally, the post introduces Stagehand, an innovative browser automation framework for AI agents.
126
1
20
Article
Daily Dose of Data Science | Avi Chawla | Substack·16w
The AI Engineering Guidebook
A comprehensive 350+ page guidebook covering the engineering fundamentals of LLM systems, including model architecture, training, prompt engineering, RAG systems, fine-tuning techniques like LoRA, AI agents, Model Context Protocol, optimization strategies, and deployment considerations. The resource focuses on practical engineering decisions, system design tradeoffs, and real-world implementation patterns rather than surface-level usage.
125
5
21
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
Time Complexity of 10 ML Algorithms
Understanding the run-time complexity of machine learning algorithms is essential for efficient model implementation. Popular algorithms like SVM and t-SNE have limitations with large datasets due to their cubic and quadratic time complexities, respectively. Accurate knowledge of these complexities helps in selecting the right algorithm and optimizing performance.
105
22
Article
Daily Dose of Data Science | Avi Chawla | Substack·38w
4 Stages of Training LLMs from Scratch
Training large language models from scratch involves four key stages: pre-training on massive text corpora to learn language basics, instruction fine-tuning to make models conversational and follow commands, preference fine-tuning using human feedback (RLHF) to align with human preferences, and reasoning fine-tuning for mathematical and logical tasks using correctness as a reward signal. Each stage builds upon the previous one to create increasingly capable and aligned AI systems.
104
2
23
Article
Daily Dose of Data Science | Avi Chawla | Substack·43w
10 MCP, RAG and AI Agents Projects
A curated collection of 10 advanced AI engineering projects covering MCP-powered applications, RAG systems, and AI agents. Projects include video RAG with exact timestamp retrieval, corrective RAG with self-assessment, multi-agent flight booking systems, voice-enabled RAG agents, and local alternatives to ChatGPT's research features. The repository contains 70+ hands-on tutorials focusing on real-world implementations of LLMs, memory-enabled agents, multimodal document processing, and performance optimization techniques like binary quantization for 40x faster RAG systems.
99
24
Article
Daily Dose of Data Science | Avi Chawla | Substack·46w
Build an MCP Server to Connect to 200+ Data Sources
A guide to building a Model Context Protocol (MCP) server using MindsDB that can connect to over 200 data sources including Slack, Gmail, GitHub, and Hacker News. The setup uses Docker for local hosting and integrates with Cursor IDE, providing tools to list databases and query federated data through a unified interface. The implementation demonstrates practical use cases like fetching Hacker News data, sending formatted summaries to Slack, and retrieving Gmail messages.
98
25
Article
Daily Dose of Data Science | Avi Chawla | Substack·46w
Building an MCP-powered Financial Analyst
Explore the process of building a financial analyst powered by MCP (Multi-agent CP), which integrates AI components like DeepSeek-R1 LLM and CrewAI for multi-agent orchestration. The system setup includes agents for query parsing, code writing, executing, and visualization of financial data. The tech stack utilizes various tools to create a locally functional financial analyst that can perform complex stock analysis and generate visual outputs.
89

See all Daily Dose of Data Science | Avi Chawla | Substack archives