Best of MLOps2025

  1. 1
    Article
    Avatar of swirlaiSwirlAI·1y

    The evolution of Modern RAG Architectures.

    The post delves into the evolution of Retrieval Augmented Generation (RAG) architectures, discussing their development from Naive RAG to advanced techniques, including Cache Augmented Generation (CAG) and Agentic RAG. It highlights the challenges addressed at each stage, advanced methods to improve accuracy, and the potential future advancements in RAG systems.

  2. 2
    Article
    Avatar of medium_jsMedium·1y

    Building a TikTok-like recommender

    A comprehensive guide on building a TikTok-like real-time personalized recommender system, detailing the architecture, including the 4-stage recommender model, and the two-tower neural network design. It uses an H&M retail dataset for practical application, teaches feature engineering, model training, and serving using the Hopsworks AI Lakehouse. The post is part of an open-source course focused on deploying scalable recommenders.

  3. 3
    Article
    Avatar of swirlaiSwirlAI·1y

    Building Deep Research Agent from scratch

    The post guides readers through building a Deep Research Agent using the DeepSeek R1 model. It explains the concept of Deep Research Agents, outlines their components and steps involved, and provides a thorough implementation guide using SambaNova's platform. The setup includes planning the research, splitting tasks, performing in-depth web searches, reflecting on gathered data, and summarizing results into a final research report. The necessary code and prompts are shared for an interactive learning experience.

  4. 4
    Article
    Avatar of swirlaiSwirlAI·36w

    Learning AI Engineering in 2025

    An AI engineering bootcamp instructor reflects on the success of their first cohort, sharing metrics like 40 hours of live lectures and 250 pages of materials. The program focuses on building production-ready AI systems end-to-end, with upcoming improvements including deeper evaluation focus, context engineering, guest lectures, and Modal cloud partnerships. The bootcamp targets data scientists, ML engineers, founders, and software engineers looking to transition into AI engineering.

  5. 5
    Article
    Avatar of mlmMachine Learning Mastery·50w

    10 MLOps Tools for Machine Learning Practitioners to Know

    MLOps combines machine learning with DevOps practices to streamline model lifecycle management from training to deployment. Ten essential tools are highlighted: MLflow for experiment tracking, Weights & Biases for visualization, Comet for monitoring, Airflow for workflow automation, Kubeflow for Kubernetes-based pipelines, DVC for data versioning, Metaflow for Python workflows, Pachyderm for data pipelines, Evidently AI for model monitoring, and TensorFlow Extended for complete ML pipelines. These tools address different aspects of MLOps including experiment tracking, workflow automation, data versioning, and model monitoring to help teams build reliable, production-ready machine learning systems.

  6. 6
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·41w

    The Full MLOps/LLMOps Blueprint

    MLOps extends beyond model training to encompass the entire production ML system lifecycle, including data pipelines, deployment, monitoring, and infrastructure management. The crash course covers foundational concepts like why MLOps matters, differences from traditional DevOps, and system-level concerns, followed by hands-on implementation of the complete ML workflow from training to API deployment. MLOps applies software engineering and DevOps practices to manage the complex infrastructure surrounding ML code, ensuring reliable delivery of ML-driven features at scale.

  7. 7
    Article
    Avatar of swirlaiSwirlAI·1y

    Simple way to explain Memory in AI Agents.

    SwirlAI is partnering with NVIDIA to give away an NVIDIA RTX 4080 SUPER GPU. To enter, register for the GTC 2025 conference, which is free and runs from March 17-21 both in San Jose, CA and virtually. Highlights include sessions on humanoid robots, generative AI for edge applications, and advancements in European robotics. The post also explains four types of memory in AI agents: episodic, semantic, procedural, and short-term (working) memory.

  8. 8
    Article
    Avatar of tdsTowards Data Science·22w

    6 Technical Skills That Make You a Senior Data Scientist

    Senior data scientists distinguish themselves through a structured six-stage workflow for building data products: mapping the business ecosystem, defining product constraints as operators, designing systems end-to-end before coding, starting with simple models and adding complexity only when justified, rigorously evaluating outputs through manual review and appropriate metrics, and tailoring communication to different audiences (product managers, engineers, other data scientists). The emphasis is on understanding context, making design-level trade-offs, and delivering production-ready solutions rather than just technical coding ability.

  9. 9
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·1y

    4 Ways to Test ML Models in Production

    Testing machine learning models in production is crucial for reliability. Four key strategies are A/B testing, canary testing, interleaved testing, and shadow testing. These methods allow models to be tested on real-world data while minimizing risk and user impact. Tools like Maxim can aid in simulating, evaluating, and observing AI agents for better performance before deployment.

  10. 10
    Article
    Avatar of tdsTowards Data Science·35w

    How to Become a Machine Learning Engineer (Step-by-Step)

    A comprehensive roadmap for becoming a machine learning engineer, covering essential skills from mathematics and statistics to Python programming, SQL, machine learning algorithms, deep learning, software engineering practices, and MLOps. The guide emphasizes practical learning with specific resource recommendations for each area, highlighting that engineering skills are often more important than theoretical knowledge for career success.

  11. 11
    Article
    Avatar of javarevisitedJavarevisited·29w

    I’ve Read 20+ Books on AI and LLM — Here Are My Top 5 Recommendations for 2026

    A curated list of five essential books for learning AI and LLM engineering, covering practical topics from building and fine-tuning models to production deployment. The recommendations include hands-on guides for prompt optimization, retrieval-augmented generation, model evaluation, infrastructure design, and understanding transformer architectures from scratch. Each book emphasizes production-ready engineering practices including monitoring, cost optimization, and system design rather than pure theory.

  12. 12
    Article
    Avatar of huggingfaceHugging Face·29w

    huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning

    The huggingface_hub Python library has reached v1.0 after five years of development, now powering 200,000 dependent libraries and providing access to over 2 million models, 500,000 datasets, and 1 million Spaces. Major changes include migration from requests to httpx for modern HTTP infrastructure, a redesigned CLI replacing huggingface-cli with expanded features, and full adoption of hf_xet for file transfers with chunk-level deduplication. The release removes legacy patterns like the Git-based Repository class while maintaining backward compatibility for most ML libraries, though transformers v5 will be required for full v1.x support.

  13. 13
    Article
    Avatar of taiTowards AI·1y

    Data Scientists in the Age of AI Agents and AutoML

    The role of data scientists is transforming with the advent of AI agents, AutoML, and pre-trained models. Traditional skills like Python scripting and model building are no longer sufficient. Modern data scientists need to focus on end-to-end solutions, understanding the entire data lifecycle, cloud platforms, CI/CD practices, and possess strong business acumen. Mastery of tools like Docker, Kubernetes, and major cloud services is essential. The emphasis is shifting from coding to integrating models into scalable, business-critical systems.

  14. 14
    Article
    Avatar of collectionsCollections·49w

    Building an End-to-End MLOps Pipeline for YouTube Sentiment Analysis

    A comprehensive 3-hour MLOps course teaches students to build a production-ready YouTube sentiment analysis pipeline. The curriculum covers data collection and preprocessing using Reddit sentiment data, model development with MLflow tracking on AWS, and deployment using Flask and Docker. Students learn advanced practices including DVC version control, model stacking, and CI/CD automation. The course culminates in creating a Chrome extension for real-time YouTube comment sentiment analysis, providing hands-on experience with modern ML tools and bridging the gap between theory and deployable ML solutions.

  15. 15
    Article
    Avatar of communityCommunity Picks·1y

    5 Must-Know Open-Source Tools for DevOps and MLOps Developers

    DevOps and MLOps are essential for streamlining development and deployment workflows. This post highlights five open-source tools that are crucial: KitOps for packaging AI/ML projects, Kubernetes for container orchestration, Pulumi for cloud resource management, Dagger for CI/CD pipelines, and Jenkins for automation. Each tool offers unique features to enhance productivity and simplify complex processes in software development and machine learning operations.

  16. 16
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·32w

    AI Agent Deployment Strategies

    Four deployment patterns for AI agents are explored: batch deployment for scheduled bulk processing with high throughput, stream deployment for continuous real-time data pipeline processing, real-time deployment via APIs for instant user interactions, and edge deployment on user devices for privacy and offline functionality. Each pattern serves different performance requirements, with batch optimizing throughput, stream enabling continuous monitoring, real-time providing sub-second responses, and edge ensuring data privacy without server dependencies.

  17. 17
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·40w

    The Full MLOps/LLMOps Blueprint

    A comprehensive crash course covering MLOps and LLMOps fundamentals, from foundational concepts to hands-on implementations. The series explores ML system lifecycle, data pipelines, model training, deployment, and monitoring. Part 3 focuses specifically on reproducibility and versioning using tools like Git, DVC, and MLflow, emphasizing that ML systems require extensive infrastructure beyond just the ML code itself.

  18. 18
    Article
    Avatar of dockerDocker·35w

    9 Rules for AI PoC Success That Actually Ship

    A practical guide for building AI proof-of-concepts that successfully transition to production systems. Introduces the concept of 'remocal workflows' (combining remote and local development) and outlines nine essential rules including starting small, designing for production from day zero, optimizing for repeatability, implementing feedback loops, solving real business problems, tracking costs early, establishing clear ownership, controlling expenses, and involving actual users throughout development.

  19. 19
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·38w

    Data and Pipeline Engineering for ML Systems (With Implementation)

    A comprehensive MLOps crash course covering data and pipeline engineering for ML systems. The series explores data sources, ETL pipelines, model training, deployment, versioning, and reproducibility. It includes hands-on implementations using tools like PyTorch, MLflow, Git, DVC, and Weights & Biases, providing both foundational concepts and practical system-level thinking for production ML environments.

  20. 20
    Article
    Avatar of medium_jsMedium·1y

    AI Engineer Interview Questions and Answers

    The post provides a comprehensive guide on common AI engineer interview questions and their corresponding answers. Topics covered include linear regression assumptions, handling imbalanced data, backpropagation in neural networks, Transformer architecture, bias-variance tradeoff, model deployment, and various machine learning and deep learning concepts. It also addresses deployment and monitoring of models, preventing overfitting, challenges in real-world dataset management, and differences between various algorithms and techniques.

  21. 21
    Article
    Avatar of swirlaiSwirlAI·1y

    Building AI Agents from scratch - Part 2: Reflection and Working Memory

    Learn about the Reflection pattern in AI agent systems, its relation to short-term memory, and how to implement an Agent class that utilizes Reflection to improve performance. This guide offers code examples, explains pros and cons, and showcases the connection between agent memory and Reflection capabilities. The practical example includes revising an action plan generated by an AI agent to fix hallucinations and improve response accuracy.

  22. 22
    Article
    Avatar of lightbendLightbend·1y

    What is AI orchestration? 21+ tools to consider in 2025

    AI orchestration streamlines the coordination and integration of AI systems, such as models, data pipelines, and infrastructure, through efficient workflows. It involves automating tasks, optimizing resource usage, and monitoring system operations. Unlike ML orchestration, which focuses on ML pipelines, AI orchestration includes managing entire AI systems and various components. The post highlights several tools for AI orchestration, each suited for different roles such as software engineers and data scientists, emphasizing their strengths in scalability, reliability, and usability.