Best of Llama — 2025

1
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
Building a 100% local MCP Client
Learn how to build a completely local Model Context Protocol (MCP) client using tools like LlamaIndex, Ollama, and LightningAI. The tutorial provides a comprehensive walkthrough to create an MCP client capable of communicating with external tools and data sources through a structured protocol. It demonstrates setting up an SQLite server and building an AI agent using Deepseek-R1 as the local LLM, providing users with context-aware responses based on their queries.
133
1
2
Article
Daily Dose of Data Science | Avi Chawla | Substack·46w
MCP Integration with 4 Popular Agentic Frameworks
Part 8 of an MCP crash course demonstrates how to integrate Model Context Protocol with four popular agentic frameworks: LangGraph, CrewAI, LlamaIndex, and PydanticAI. The tutorial provides step-by-step practical walkthroughs for connecting MCP to each framework, along with detailed implementations. This builds on previous parts covering MCP fundamentals, custom client development, tools/resources/prompts, sampling integration, and security considerations including testing and sandboxing.
67
3
Article
Daily Dose of Data Science | Avi Chawla | Substack·43w
Make RAG systems 32x Memory Efficient!
Binary quantization can make RAG systems 32x more memory efficient by converting float32 embeddings to binary vectors. The technique involves ingesting documents, generating binary embeddings, storing them in a vector database like Milvus, and using Hamming distance for retrieval. A complete implementation demonstrates querying 36M+ vectors in under 30ms using LlamaIndex, Milvus, and Groq for inference, with deployment via Beam Cloud.
54
1
4
Article
Hacker News·1y
Run DeepSeek-R1 Dynamic 1.58-bit
The post explains how to install and run the DeepSeek-R1 model, highlighting the importance of adding BOS and EOS tokens in interactions. It provides detailed setup instructions using commands like `apt-get update` for dependencies, downloading the model via `huggingface_hub`, and outlines how to configure GPU offloading based on available memory. Additionally, there's guidance on quantizing the model's K cache to 4bit and running the model using those configurations.
47
3
5
Video
AI Engineer·49w
Building AI Agents that actually automate Knowledge Work - Jerry Liu, LlamaIndex
Jerry Liu from LlamaIndex presents a framework for building AI agents that automate knowledge work over unstructured documents. He distinguishes between assistive agents (chat interfaces that help humans get information) and automation agents (background processes that handle routine tasks). The approach requires a comprehensive document toolbox with parsing capabilities for complex PDFs, Excel sheets, and other formats, plus appropriate agent architectures ranging from constrained to unconstrained workflows. Real-world applications include financial due diligence, enterprise search, and technical data sheet processing, with LlamaIndex providing cloud services for document parsing that outperform existing benchmarks.
37
6
Video
Hiten Shah·50w
Why Meta stole millions of books to train AI
Meta downloaded 82 terabytes of pirated books from shadow libraries like LibGen to train their Llama AI model, despite legal concerns from engineers. After publishers refused licensing deals deemed too expensive and slow, Meta chose piracy over falling behind competitors like OpenAI and Google. The pirated data improved Llama's performance by 5%, leading to 800 more correct answers. Meta covered their tracks by masking IP addresses and removing copyright tags, while relying on a fair use legal defense strategy shared across the AI industry when facing inevitable lawsuits from authors and publishers.
36
16
7
Article
Daily Dose of Data Science | Avi Chawla | Substack·39w
Corrective RAG Agentic Workflow
Corrective RAG (CRAG) enhances traditional RAG systems by adding a self-assessment step that evaluates retrieved document relevance before generating responses. The workflow searches documents, uses an LLM to assess context relevance, retains only relevant information, performs web search when needed, and aggregates context for final response generation. The implementation uses a tech stack including Firecrawl for web search, Milvus for vector storage, Beam for deployment, and LlamaIndex workflows for orchestration, with observability through CometML's Opik.
26

See all Llama archives