Best of Machine Learning — December 2025

1
Article
LangChain·23w
Agent Engineering: A New Discipline
Agent engineering is an iterative discipline for building reliable LLM-based agents in production. It combines product thinking (prompt writing, defining scope), engineering (building tools, infrastructure, UI), and data science (evaluation, monitoring, analysis) in a continuous cycle of build, test, ship, observe, and refine. Unlike traditional software, agents handle unpredictable natural language inputs and non-deterministic behavior, making production deployment essential for learning what actually works. Successful teams treat shipping as a learning mechanism rather than an end goal, using tracing and evaluation to systematically improve agent reliability through rapid iteration.
160
2
Article
Daily Dose of Data Science | Avi Chawla | Substack·21w
The AI Engineering Guidebook
A comprehensive 350+ page guidebook covering the engineering fundamentals of LLM systems, including model architecture, training, prompt engineering, RAG systems, fine-tuning techniques like LoRA, AI agents, Model Context Protocol, optimization strategies, and deployment considerations. The resource focuses on practical engineering decisions, system design tradeoffs, and real-world implementation patterns rather than surface-level usage.
125
5
3
Article
MIT News·22w
Deep-learning model predicts how fruit flies form, cell by cell
MIT researchers developed a deep-learning model that predicts cell-by-cell development during fruit fly embryo formation with 90% accuracy. The model uses a dual-graph structure representing cells as both point clouds and foam-like bubbles, tracking properties like position, division, and folding minute-by-minute during gastrulation. The approach could eventually predict development in more complex organisms and identify early disease patterns in conditions like asthma and cancer, though high-quality video data remains the primary limitation for broader applications.
90
7
4
Article
Android Developers Blog·22w
Build smarter apps with Gemini 3 Flash
Gemini 3 Flash is now available through Firebase AI Logic, offering frontier AI intelligence optimized for speed and cost-effectiveness. The model excels at reasoning, tool use, and multimodal capabilities including video analysis and visual Q&A. Integration is straightforward using Firebase AI Logic SDK for Android apps, with features like AI monitoring dashboards for tracking latency and costs, and server-side prompt templates for secure prompt management. Gemini 3 Flash is also available in Android Studio for development assistance at no cost, with higher rate limits accessible through AI Studio API keys or Gemini Code Assist licenses.
76
2
5
Article
Simple Thread·24w
Getting Back to Basics
A hands-on exploration of building machine learning models from scratch, starting with a trading algorithm using regression trees that achieved 220% returns on historical stock data. The author then tackles energy demand forecasting by implementing a feed-forward neural network with backpropagation before upgrading to LSTM networks to handle temporal patterns. Key challenges include addressing gradient explosion through data scaling, switching from ReLU to tanh activation functions, and implementing the Adam optimizer. The final LSTM model with 50 neurons successfully predicts hourly energy interconnection flows without overfitting, demonstrating that foundational ML techniques remain powerful tools for practical time-series forecasting problems.
75
2
6
Article
Claude·22w
Making Claude a better electrical engineer
Anthropic partnered with Diode Computers to improve Claude's ability to auto-generate electrical circuit board reference designs from chip documentation. The collaboration focused on teaching Claude to work with Zener, a domain-specific language for PCB schematics, and to interpret dense technical documentation. Claude Sonnet 4.5 now produces reference designs preferred by electrical engineers 8 out of 10 times compared to earlier versions, better capturing documentation nuances and following toolchain conventions. This demonstrates how domain experts can collaborate with Anthropic to enhance Claude's performance on specialized technical tasks.
61
3
7
Article
Machine Learning Mastery·24w
The Roadmap for Mastering Agentic AI in 2026
A comprehensive learning path for building autonomous AI systems that can plan, reason, and act independently. Covers foundational mathematics and programming, machine learning fundamentals, autonomous agent architectures, specialization areas like robotics and workflow automation, deployment strategies using Docker and cloud platforms, and portfolio development. Includes curated resources from beginner prerequisites through advanced topics like multi-agent systems, transformer-based decision-making, and reinforcement learning with human feedback.
57
4
8
Video
Fireship·23w
OpenAI is edging us all... Closer to AGI
OpenAI released GPT 5.2, reclaiming leadership in AI benchmarks after Google's Gemini 3 dominance. The model shows a 390x efficiency improvement over its predecessor and tops the ARC AGI benchmark, which tests reasoning and generalization rather than memorization. The release includes improved coding capabilities and fewer hallucinations, though practical differences may be subtle for average users. OpenAI also secured a $1 billion deal with Disney for AI-generated content featuring iconic characters.
52
8
9
Article
The Palindrome·24w
The Story of the Mathematics of Machine Learning Book
A mathematician shares his four-year journey of accidentally writing a 700-page machine learning textbook while building an audience through Twitter threads and Substack. Starting as a creative outlet after a failed startup, he validated the idea through early access sales, navigated platform algorithm changes, and eventually partnered with Packt Publishing. The story covers content creation strategies, the challenges of self-publishing versus traditional publishing, and how constraints like Twitter's character limit shaped his teaching style and visual approach to explaining complex mathematical concepts.
52
1
10
Article
Hacker News·24w
LLMs are a failure. A new AI winter is coming.
Large Language Models (LLMs) face fundamental limitations that make them unsuitable for most practical applications. The core issue is that transformers generate plausible-sounding output by predicting the next token, which inevitably leads to hallucinations when the model lacks relevant training data. This results in a 5-40% failure rate that cannot be eliminated through scaling or fine-tuning. The author predicts an imminent AI bubble burst, with corporate AI projects failing at a 95% rate, similar to the dot-com crash. While some use cases will survive, the technology's inability to reliably distinguish correct from incorrect output makes it dangerous for critical applications like medicine, education, and law enforcement.
53
29
11
Article
Towards Data Science·22w
6 Technical Skills That Make You a Senior Data Scientist
Senior data scientists distinguish themselves through a structured six-stage workflow for building data products: mapping the business ecosystem, defining product constraints as operators, designing systems end-to-end before coding, starting with simple models and adding complexity only when justified, rigorously evaluating outputs through manual review and appropriate metrics, and tailoring communication to different audiences (product managers, engineers, other data scientists). The emphasis is on understanding context, making design-level trade-offs, and delivering production-ready solutions rather than just technical coding ability.
48
1
12
Article
Prince Kumar·21w
Every developer, every time
42
4
13
Video
bycloud·24w
how this tiny model beat ChatGPT on the “AGI” benchmark [HRM & TRM]
Two novel AI models, HRM (27M parameters) and TRM (7M parameters), challenge the scaling paradigm by outperforming large language models like GPT-4 on the ARC AGI benchmark through recursive reasoning. Instead of processing everything in one pass, these tiny models iteratively refine answers using dual-network architectures with fast and slow update cycles. TRM achieves 40% on ARC AGI with just 7 million parameters by training on actual loop behavior rather than assumed equilibrium states. Empirical results show that smaller models with more recursion outperform larger models with more layers, suggesting that for constrained logical tasks, iterative refinement beats raw parameter scaling.
39
14
Article
mlflow·21w
AI Observability for Every TypeScript LLM Stack
MLflow 3.6 introduces automatic tracing integrations for TypeScript and JavaScript LLM frameworks including Vercel AI SDK, LangChain.js, LangGraph.js, Mastra, Anthropic, and Gemini. These integrations use OpenTelemetry to send traces to MLflow's tracking server, capturing prompt/response payloads, token usage, tool results, and errors. Setup requires minimal configuration—typically just pointing an OTLP endpoint to your MLflow server and wrapping SDK clients. MLflow can be deployed via Docker Compose or managed cloud services, eliminating the need for a Python environment alongside JavaScript stacks.
35
15
Article
Sebastian Raschka·23w
From Random Forests to RLVR: A Short History of ML/AI Hello Worlds
A chronological overview traces the evolution of beginner-friendly ML/AI examples from 2013 to 2025. Starting with Random Forests on Iris datasets and XGBoost on Kaggle competitions, it progresses through neural networks (MLPs, AlexNet), transformer models (DistilBERT, Llama 2 with LoRA), and culminates with reasoning models using RLVR on mathematical datasets. Each milestone reflects when methods became mainstream and accessible, often lagging years behind their initial publication due to tooling maturity and community adoption.
32
16
Article
AI Products·24w
SAM 3 just dropped, and it's a big deal
Meta released SAM 3, an open-source computer vision model that enables text-based object segmentation in images and videos. The model supports multiple input methods including text prompts, clicks, and bounding boxes, and can track objects across video frames. Trained on over 4 million unique concepts, it reportedly delivers double the accuracy of competing systems on open-vocabulary segmentation tasks. The model is available on GitHub with weights and starter notebooks.
32
3
17
Article
Hugging Face·24w
Transformers v5: Simple model definitions powering the AI ecosystem
Hugging Face releases Transformers v5, marking five years since v4 with daily installs growing from 20,000 to 3 million. The library now supports over 400 model architectures and 750,000 community checkpoints. Version 5 focuses on simplicity through modular design, improved training support for both pre-training and fine-tuning, enhanced inference capabilities with continuous batching and a new serving API, and first-class quantization support. The release emphasizes interoperability across the ecosystem, enabling seamless integration with inference engines like vLLM and SGLang, local deployment tools like llama.cpp and MLX, and training frameworks like Unsloth and Axolotl.
31
18
Article
vLLM·22w
Token-Level Truth: Real-Time Hallucination Detection for Production LLMs
HaluGate is a real-time hallucination detection system for production LLMs that identifies when models generate claims contradicting provided context. It uses a two-stage pipeline: first classifying whether queries need fact-checking (96.4% accuracy, 12ms latency), then performing token-level detection with NLI explanation for factual queries (76-162ms overhead). Built with ModernBERT and native Rust/Candle integration, it runs without Python dependencies, adding negligible latency compared to LLM generation times. The system integrates with vLLM's Signal-Decision Architecture, exposing results via HTTP headers for downstream policy enforcement. Unlike LLM-as-judge approaches, HaluGate provides explainable, consistent verification specifically for extrinsic hallucinations where tool/RAG context exists.
27
1
19
Video
Siliconversations·20w
Why Does The Seahorse Emoji Drive ChatGPT Insane?
ChatGPT enters an infinite loop when asked about the seahorse emoji because it predicts one should exist but cannot produce it. As a next-word predictor, the model gets stuck repeatedly trying to correct itself. The issue likely stems from Reddit posts in its training data where people falsely remember a seahorse emoji existing (Mandela effect), creating a contradiction between what the model expects to exist and what it can actually output.
26
3
20
Article
Jeff Geerling·22w
1.5 TB of VRAM on Mac Studio - RDMA over Thunderbolt 5
Testing RDMA over Thunderbolt 5 on a four-Mac Studio cluster with 1.5 TB unified memory shows significant performance gains for running massive AI models. The M3 Ultra Mac Studio outperforms comparable systems from Nvidia and AMD in CPU, AI inference, and power efficiency benchmarks. RDMA support in Exo 1.0 enables linear performance scaling across nodes, achieving 30+ tokens/second on trillion-parameter models. However, limitations include Thunderbolt 5's four-node maximum, macOS cluster management challenges, stability issues with prerelease software, and lack of standard networking options like QSFP for larger deployments.
26
2
21
Article
Hacker News·23w
Tongyi-MAI/Z-Image
Z-Image is a 6B parameter image generation model featuring three variants: Z-Image-Turbo (distilled for sub-second inference with 8 NFEs on H800 GPUs), Z-Image-Base (foundation model for fine-tuning), and Z-Image-Edit (specialized for image editing). Built on a Scalable Single-Stream DiT architecture, it excels at photorealistic generation, bilingual text rendering (English/Chinese), and instruction following. The model uses Decoupled-DMD distillation algorithm and DMDR (combining DMD with reinforcement learning) for few-step generation optimization. Available on Hugging Face and ModelScope with PyTorch and Diffusers support.
25
22
Article
Cloudflare·24w
Why Replicate is joining Cloudflare
Replicate, a platform for running machine learning models as APIs, has been acquired by Cloudflare. Founded in 2019 to make research models accessible to developers through tools like Cog, Replicate became a key infrastructure provider during the Stable Diffusion era. The acquisition enables integration with Cloudflare's network infrastructure, Workers, R2, and other services to build a comprehensive AI stack. The combined platform aims to support edge model execution, instant-booting Workers for model pipelines, and WebRTC streaming for model inputs and outputs.
21
23
Article
Valdemar·24w
OpenAGI launched something interesting - Lux
OpenAGI released Lux, a foundation AI agent that controls computers through screenshots and action sequences rather than text. It outperforms competing solutions from OpenAI, Google, and Anthropic on real-world tasks (83.6% vs 69% for Gemini CUA), operates faster (~1 second per step), and costs 10× less. Unlike browser-only alternatives, Lux works across desktop applications including Excel, Slack, Adobe products, and IDEs. The model is available via API and SDK, with Intel collaboration underway for local laptop optimization.
20

See all Machine Learning archives