Best of Daily Dose of Data Science | Avi Chawla | Substack — November 2024

1
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
A Crash Course on Building RAG Systems – Part 4
Part 4 of the crash course on building RAG systems focuses on implementing RAG on multimodal data, specifically complex documents with tables, texts, and images. This series covers foundational components, evaluation methods, optimization techniques, and handling large data sets, making it highly beginner-friendly. Understanding how to build reliable RAG systems can reduce costs and enhance scalability for enterprises, bypassing the need for fine-tuning large language models (LLMs).
118
2
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
Pandas vs. FireDucks Performance Comparison
FireDucks is a highly optimized alternative to Pandas, boasting a significant speed improvement through lazy execution. Users only need to replace their Pandas import with FireDucks. Benchmarks show FireDucks outperforming Pandas and other libraries like Modin and Polars, particularly in its speedy performance. The post provides instructions for installing FireDucks, using it in Jupyter Notebook, and integrating it into existing Python scripts.
98
2
3
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
Simplify Python Imports with Explicit Packaging
Learn how to simplify your Python project imports by explicitly packaging your project with an __init__.py file. This method not only helps to avoid redundant imports but also allows you to specify which classes and functions can be imported from the package. The article explains the difference between modules, packages, and libraries, and provides a step-by-step guide on how to use __init__.py to streamline your code.
56
1
4
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
Building a Multi-agent Financial Analyst
The post demonstrates building a multi-agent financial analyst using Microsoft's Autogen and Llama3-70B. It outlines the tech stack, including the roles of code executor and code writer agents. The guide provides steps to set up the agents, execute code, and display stock analysis results. Additional resources and a GitHub repository for further exploration are also mentioned.
36
5
Article
Daily Dose of Data Science | Avi Chawla | Substack·1y
16 Popular Open-source Contributions by Big Tech
Big tech companies like Microsoft, Google, Meta, Yandex, and NVIDIA have significantly contributed to the machine learning ecosystem through various open-source projects. These contributions include Microsoft's DeepSpeed and ONNX, Google's TensorFlow and JAX, Meta's PyTorch and LLaMA, Yandex's CatBoost and ClickHouse, and NVIDIA's RAPIDS and TensorRT. Understanding these tools can help you tackle real-world problems efficiently.
35
1

See all Daily Dose of Data Science | Avi Chawla | Substack archives