Best of Data ScienceJanuary 2025

  1. 1
    Article
    Avatar of mlmMachine Learning Mastery·1y

    The Roadmap for Mastering Machine Learning in 2025

    Machine learning (ML) is integral to many sectors, making it a valuable skill by 2025. This guide offers a step-by-step roadmap for mastering ML, starting with prerequisites in mathematics and programming, followed by core ML concepts, deep learning, and specialization in fields like computer vision or NLP. It also covers model deployment and building a portfolio to showcase projects. The emphasis is on practical learning through projects and continuous skill enhancement.

  2. 2
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·1y

    5 Agentic AI Design Patterns

    Explore five agentic AI design patterns that enhance the effectiveness of AI agents through reflection, tool use, reason and act, planning, and multi-agent approaches. Learn how Firecrawl Extract facilitates web scraping by using simple English prompts to extract clean, structured data. Discover additional resources on machine learning techniques and data science provided by Daily Dose of Data Science.

  3. 3
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·1y

    Pandas Mind Map

    A detailed mind map of various Pandas methods categorized by their operation types, including I/O methods, DataFrame creation, statistical information, renaming, plotting, time-series, grouping, pivot, and categorical data methods. Additional ML resources and techniques are also provided for developing industry-relevant skills.

  4. 4
    Article
    Avatar of detlifeData Engineer Things·1y

    End to End Data Engineering

    This post details the tools, technologies, and concepts essential for data engineering, emphasizing different paths for success based on roles and backgrounds. It highlights the importance of both analytics and infrastructure sides and mentions popular tools like Airflow and Snowflake. The significance of software engineering principles and specific data engineering roles is also discussed.

  5. 5
    Article
    Avatar of medium_jsMedium·1y

    0$ to 70.000$ Freelance Journey

    The journey from aspiring basketball player to a successful freelancer that earned $70,000 by completing software development projects globally began during the pandemic in May 2020. After starting with YouTube tutorials and courses on platforms like Coursera, the author eventually found success on Upwork. Key takeaways include the importance of patience, selecting a niche, professional communication, and constant experimentation and learning.

  6. 6
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·1y

    A crash course on RAG systems—Part 9

    Part 9 of the crash course on RAG systems provides a comprehensive guide to building powerful RAG systems with a focus on vision language models. It includes a detailed breakdown of ColPali, a state-of-the-art RAG system, showcasing its scalability, accuracy, and integration with binary quantization for low latency applications. The series is beginner-friendly and covers everything from fundamentals to advanced optimization and multimodal applications.

  7. 7
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·1y

    Build Human-like Memory for Your AI Agents

    Zep introduces a temporally-aware knowledge graph for AI agents, addressing real-time knowledge updates and fast data retrieval. The architecture comprises three layers: episodic memory for storing raw data, semantic memory for extracting entities and relationships, and community memory for summarizing related entities. This design improves accuracy by up to 18.5% and reduces latency by 90% compared to traditional approaches like MemGPT.

  8. 8
    Article
    Avatar of mlmMachine Learning Mastery·1y

    Future-Proof Your Machine Learning Career in 2025

    A successful machine learning career in 2025 requires future-proofing skills through a mix of core technical knowledge, embracing emerging trends, and developing essential soft skills. Key technical skills include programming, foundational math and statistics, data handling, and model evaluation. Staying updated with trends like multimodal generative AI, autonomous agents, explainable AI, and ethical AI is crucial. Additionally, cultivating communication, problem-solving, adaptability, and continuous learning will differentiate good professionals from great ones.

  9. 9
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·1y

    Our Two Agentic Apps Built with CrewAI

    CrewAI is an open-source framework designed for orchestrating advanced AI agent systems. It offers customizable agents, collaborative intelligence, flexible task management, reliable architecture, and versatile orchestration. The post highlights two demos: an automated social media content generator and a multi-agent news generator, showcasing CrewAI's capabilities.

  10. 10
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·1y

    FireDucks vs. Pandas vs. DuckDB vs. Polars

    FireDucks is an optimized alternative to Pandas with the same API, requiring just an import replacement to use. It demonstrates a significant speed boost for big data operations, achieving an average speed-up of 125x over Pandas. FireDucks' lazy execution builds and optimizes a logical execution plan, unlike Pandas' immediate execution. It can be used with IPython, Jupyter Notebooks, or within existing Pandas pipelines by replacing import statements. Detailed benchmarks and usage examples are provided, showing substantial performance improvements in practical scenarios.

  11. 11
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·1y

    Loss Function of 16 ML Algos

    Provides a visual summary of loss functions used in 16 common machine learning algorithms. It highlights the importance of selecting appropriate loss functions for different tasks. Covers algorithms like linear regression, logistic regression, decision trees, SVMs, neural networks, and various boosting methods. Additional resources and readings are suggested to enhance understanding and application in real-world scenarios.

  12. 12
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·1y

    A crash course on RAG systems—Part 8

    Part 8 of the crash course on building RAG systems focuses on improving rerankers with an in-depth architectural breakdown of the ColBERT model, which balances scalability and accuracy for reranking modules. The series covers foundational components, evaluation, optimization, multimodality, and graph-based RAG systems, designed to help beginners implement reliable RAG systems effectively.

  13. 13
    Article
    Avatar of jetbrainsJetBrains·1y

    Anomaly Detection in Machine Learning Using Python

    Anomaly detection helps identify outliers in large datasets, crucial in applications like security alerts, fraud detection, and system monitoring. Machine learning techniques such as OneClassSVM and Isolation Forest enhance the accuracy and efficiency of anomaly detection processes. This guide illustrates how to use Python and tools like PyCharm to deploy these algorithms, featuring practical examples with the Beehives dataset.

  14. 14
    Video
    Avatar of youtubeYouTube·1y

    OSINT 2025: How to Gather All the Info You’ll Ever Need on Anyone.

    Open-Source Intelligence (OSINT) involves gathering publicly accessible information from sources like social media, websites, and public databases. Key OSINT tools include Google Dorking, theHarvester, ExifTool, Photon, Sherlock, Maltego, and Shodan, which help uncover patterns, track behaviors, and identify vulnerabilities. Ethical and legal use of these tools is paramount for research, cybersecurity, or investigations.

  15. 15
    Article
    Avatar of taiTowards AI·1y

    Data Scientists in the Age of AI Agents and AutoML

    The role of data scientists is transforming with the advent of AI agents, AutoML, and pre-trained models. Traditional skills like Python scripting and model building are no longer sufficient. Modern data scientists need to focus on end-to-end solutions, understanding the entire data lifecycle, cloud platforms, CI/CD practices, and possess strong business acumen. Mastery of tools like Docker, Kubernetes, and major cloud services is essential. The emphasis is shifting from coding to integrating models into scalable, business-critical systems.

  16. 16
    Article
    Avatar of faunFaun·1y

    Essential Python Libraries for Data Science in 2025

    Python remains the top choice for data science due to its versatility, simplicity, and strong community support. Essential Python libraries such as NumPy, Pandas, Scikit-learn, TensorFlow, PyTorch, SciPy, Altair, XGBoost, Statsmodels, and Plotly are highlighted for their key features and use cases, all of which continue to evolve to meet the demands of data science in 2025. Staying current with these libraries is crucial for maintaining efficiency, competitiveness, and innovation in the field.

  17. 17
    Video
    Avatar of computerphileComputerphile·1y

    Solve Markov Decision Processes with the Value Iteration Algorithm - Computerphile

    The value iteration algorithm is a method for solving Markov decision processes (MDPs) to produce optimal action decisions. MDPs model decision-making problems, particularly those under uncertainty. The algorithm iteratively computes the values of states to find the policy that minimizes cost or maximizes reward. It is essential for decision-making models where dynamic programming techniques are applied to achieve the best outcome.

  18. 18
    Article
    Avatar of rpythonReal Python·1y

    Learn From 2024's Most Popular Python Tutorials and Courses – Real Python

    Python 3.13 introduced significant features like free threading and a JIT compiler, enhancing performance, along with a redesigned REPL. Python dominated in programming language rankings and saw important releases including NumPy and Polars. Real Python showcased a variety of 2024's top tutorials and courses, covering basics, data science, object-oriented programming, web development, error handling, and more. These resources aim to help developers improve their skills, manage projects efficiently, and create innovative applications.

  19. 19
    Article
    Avatar of medium_jsMedium·1y

    How to Start LeetCode in 2025 (as a beginner)

    LeetCode remains a crucial tool for preparing for coding interviews at top tech companies in 2025. This guide offers practical tips on how to begin with LeetCode, starting from understanding the fundamentals of data structures and algorithms to focusing on problem-solving patterns. It emphasizes the importance of focusing on quality over quantity by solving a curated list of problems, avoiding memorization, practicing in timed environments, and being consistent in one’s efforts.

  20. 20
    Article
    Avatar of jetbrainsJetBrains·1y

    Data Cleaning in Data Science

    Data cleaning is essential for transforming real-world, messy datasets into reliable sources for analysis or machine learning. This involves removing duplicates, dealing with implausible values, addressing formatting issues, outliers, and missing values. Proper data cleaning ensures that conclusions drawn from the data can be generalized to a defined population. Best practices include defining your population boundaries, ensuring reproducibility, and keeping methods well-documented.

  21. 21
    Article
    Avatar of hnHacker News·1y

    zasper-io/zasper: Supercharged IDE for Data Science

    Zasper is a high-performance IDE designed to support massive concurrency with a minimal memory footprint. It utilizes significantly fewer resources than JupyterLab and can be run on local machines without requiring cloud support. Zasper is available as both an Electron App and a Web App. It primarily targets data scientists and AI engineers, aiming to enhance efficiency and support custom data applications.

  22. 22
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·1y

    7 Uses of Underscore in Python

    The underscore in Python has multiple uses, including retrieving the last computed value, serving as a placeholder in loops, simplifying large number declarations, and naming conventions for variables and methods. These naming conventions involve single leading, single trailing, and double leading underscores as well as double underscores for magic methods.

  23. 23
    Article
    Avatar of taiTowards AI·1y

    Best Laptop For Data Science

    Choosing the right laptop for data science is not as critical as it may seem; the best one is the one you have. However, some laptops offer advantages. Explore the best options for data science work across the three main operating systems: Windows, MacOS, and Linux, considering their unique features and user experiences.

  24. 24
    Article
    Avatar of taiTowards AI·1y

    I Switched From Windows To Linux For 1 Month — Here Is What Happened

    When the author's Windows PC broke, they had to rely on a Linux laptop for all their computing needs for a month. They used Pop!_OS for various tasks including browsing, gaming, studying, machine learning, and development. The post explores whether Linux can fully replace Windows, especially from a data scientist's perspective, by discussing the benefits, challenges, and differences experienced during this period.

  25. 25
    Article
    Avatar of gopenaiGoPenAI·1y

    From Messy Text to Model-Ready Data: A Guide to NLP Preprocessing

    NLP preprocessing transforms raw text into structured data ready for machine learning models. Key steps include text cleaning, tokenization, stopword removal, lemmatization, part-of-speech tagging, named entity recognition, and text vectorization. Effective preprocessing enhances model performance, making it crucial for tasks like sentiment analysis, chatbots, and language translation.