Best of Machine LearningDecember 2025

  1. 1
    Article
    Avatar of langchainLangChain·18w

    Agent Engineering: A New Discipline

    Agent engineering is an iterative discipline for building reliable LLM-based agents in production. It combines product thinking (prompt writing, defining scope), engineering (building tools, infrastructure, UI), and data science (evaluation, monitoring, analysis) in a continuous cycle of build, test, ship, observe, and refine. Unlike traditional software, agents handle unpredictable natural language inputs and non-deterministic behavior, making production deployment essential for learning what actually works. Successful teams treat shipping as a learning mechanism rather than an end goal, using tracing and evaluation to systematically improve agent reliability through rapid iteration.

  2. 2
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·16w

    The AI Engineering Guidebook

    A comprehensive 350+ page guidebook covering the engineering fundamentals of LLM systems, including model architecture, training, prompt engineering, RAG systems, fine-tuning techniques like LoRA, AI agents, Model Context Protocol, optimization strategies, and deployment considerations. The resource focuses on practical engineering decisions, system design tradeoffs, and real-world implementation patterns rather than surface-level usage.

  3. 3
    Article
    Avatar of mitMIT News·17w

    Deep-learning model predicts how fruit flies form, cell by cell

    MIT researchers developed a deep-learning model that predicts cell-by-cell development during fruit fly embryo formation with 90% accuracy. The model uses a dual-graph structure representing cells as both point clouds and foam-like bubbles, tracking properties like position, division, and folding minute-by-minute during gastrulation. The approach could eventually predict development in more complex organisms and identify early disease patterns in conditions like asthma and cancer, though high-quality video data remains the primary limitation for broader applications.

  4. 4
    Article
    Avatar of androiddevAndroid Developers Blog·17w

    Build smarter apps with Gemini 3 Flash

    Gemini 3 Flash is now available through Firebase AI Logic, offering frontier AI intelligence optimized for speed and cost-effectiveness. The model excels at reasoning, tool use, and multimodal capabilities including video analysis and visual Q&A. Integration is straightforward using Firebase AI Logic SDK for Android apps, with features like AI monitoring dashboards for tracking latency and costs, and server-side prompt templates for secure prompt management. Gemini 3 Flash is also available in Android Studio for development assistance at no cost, with higher rate limits accessible through AI Studio API keys or Gemini Code Assist licenses.

  5. 5
    Article
    Avatar of simplethreadSimple Thread·19w

    Getting Back to Basics

    A hands-on exploration of building machine learning models from scratch, starting with a trading algorithm using regression trees that achieved 220% returns on historical stock data. The author then tackles energy demand forecasting by implementing a feed-forward neural network with backpropagation before upgrading to LSTM networks to handle temporal patterns. Key challenges include addressing gradient explosion through data scaling, switching from ReLU to tanh activation functions, and implementing the Adam optimizer. The final LSTM model with 50 neurons successfully predicts hourly energy interconnection flows without overfitting, demonstrating that foundational ML techniques remain powerful tools for practical time-series forecasting problems.

  6. 6
    Article
    Avatar of claudeClaude·17w

    Making Claude a better electrical engineer

    Anthropic partnered with Diode Computers to improve Claude's ability to auto-generate electrical circuit board reference designs from chip documentation. The collaboration focused on teaching Claude to work with Zener, a domain-specific language for PCB schematics, and to interpret dense technical documentation. Claude Sonnet 4.5 now produces reference designs preferred by electrical engineers 8 out of 10 times compared to earlier versions, better capturing documentation nuances and following toolchain conventions. This demonstrates how domain experts can collaborate with Anthropic to enhance Claude's performance on specialized technical tasks.

  7. 7
    Article
    Avatar of mlmMachine Learning Mastery·19w

    The Roadmap for Mastering Agentic AI in 2026

    A comprehensive learning path for building autonomous AI systems that can plan, reason, and act independently. Covers foundational mathematics and programming, machine learning fundamentals, autonomous agent architectures, specialization areas like robotics and workflow automation, deployment strategies using Docker and cloud platforms, and portfolio development. Includes curated resources from beginner prerequisites through advanced topics like multi-agent systems, transformer-based decision-making, and reinforcement learning with human feedback.

  8. 8
    Video
    Avatar of fireshipFireship·18w

    OpenAI is edging us all... Closer to AGI

    OpenAI released GPT 5.2, reclaiming leadership in AI benchmarks after Google's Gemini 3 dominance. The model shows a 390x efficiency improvement over its predecessor and tops the ARC AGI benchmark, which tests reasoning and generalization rather than memorization. The release includes improved coding capabilities and fewer hallucinations, though practical differences may be subtle for average users. OpenAI also secured a $1 billion deal with Disney for AI-generated content featuring iconic characters.

  9. 9
    Article
    Avatar of palindromeThe Palindrome·19w

    The Story of the Mathematics of Machine Learning Book

    A mathematician shares his four-year journey of accidentally writing a 700-page machine learning textbook while building an audience through Twitter threads and Substack. Starting as a creative outlet after a failed startup, he validated the idea through early access sales, navigated platform algorithm changes, and eventually partnered with Packt Publishing. The story covers content creation strategies, the challenges of self-publishing versus traditional publishing, and how constraints like Twitter's character limit shaped his teaching style and visual approach to explaining complex mathematical concepts.

  10. 10
    Article
    Avatar of hnHacker News·19w

    LLMs are a failure. A new AI winter is coming.

    Large Language Models (LLMs) face fundamental limitations that make them unsuitable for most practical applications. The core issue is that transformers generate plausible-sounding output by predicting the next token, which inevitably leads to hallucinations when the model lacks relevant training data. This results in a 5-40% failure rate that cannot be eliminated through scaling or fine-tuning. The author predicts an imminent AI bubble burst, with corporate AI projects failing at a 95% rate, similar to the dot-com crash. While some use cases will survive, the technology's inability to reliably distinguish correct from incorrect output makes it dangerous for critical applications like medicine, education, and law enforcement.

  11. 11
    Article
    Avatar of tdsTowards Data Science·17w

    6 Technical Skills That Make You a Senior Data Scientist

    Senior data scientists distinguish themselves through a structured six-stage workflow for building data products: mapping the business ecosystem, defining product constraints as operators, designing systems end-to-end before coding, starting with simple models and adding complexity only when justified, rigorously evaluating outputs through manual review and appropriate metrics, and tailoring communication to different audiences (product managers, engineers, other data scientists). The emphasis is on understanding context, making design-level trade-offs, and delivering production-ready solutions rather than just technical coding ability.

  12. 12
    Article
    Avatar of ubqa4zl8noglmlpvdnr79Prince Kumar·16w

    Every developer, every time

  13. 13
    Video
    Avatar of bycloudbycloud·19w

    how this tiny model beat ChatGPT on the “AGI” benchmark [HRM & TRM]

    Two novel AI models, HRM (27M parameters) and TRM (7M parameters), challenge the scaling paradigm by outperforming large language models like GPT-4 on the ARC AGI benchmark through recursive reasoning. Instead of processing everything in one pass, these tiny models iteratively refine answers using dual-network architectures with fast and slow update cycles. TRM achieves 40% on ARC AGI with just 7 million parameters by training on actual loop behavior rather than assumed equilibrium states. Empirical results show that smaller models with more recursion outperform larger models with more layers, suggesting that for constrained logical tasks, iterative refinement beats raw parameter scaling.

  14. 14
    Article
    Avatar of MLflowmlflow·16w

    AI Observability for Every TypeScript LLM Stack

    MLflow 3.6 introduces automatic tracing integrations for TypeScript and JavaScript LLM frameworks including Vercel AI SDK, LangChain.js, LangGraph.js, Mastra, Anthropic, and Gemini. These integrations use OpenTelemetry to send traces to MLflow's tracking server, capturing prompt/response payloads, token usage, tool results, and errors. Setup requires minimal configuration—typically just pointing an OTLP endpoint to your MLflow server and wrapping SDK clients. MLflow can be deployed via Docker Compose or managed cloud services, eliminating the need for a Python environment alongside JavaScript stacks.

  15. 15
    Article
    Avatar of sebastianraschkaSebastian Raschka·18w

    From Random Forests to RLVR: A Short History of ML/AI Hello Worlds

    A chronological overview traces the evolution of beginner-friendly ML/AI examples from 2013 to 2025. Starting with Random Forests on Iris datasets and XGBoost on Kaggle competitions, it progresses through neural networks (MLPs, AlexNet), transformer models (DistilBERT, Llama 2 with LoRA), and culminates with reasoning models using RLVR on mathematical datasets. Each milestone reflects when methods became mainstream and accessible, often lagging years behind their initial publication due to tooling maturity and community adoption.

  16. 16
    Article
    Avatar of aiproductsAI Products·19w

    SAM 3 just dropped, and it's a big deal

    Meta released SAM 3, an open-source computer vision model that enables text-based object segmentation in images and videos. The model supports multiple input methods including text prompts, clicks, and bounding boxes, and can track objects across video frames. Trained on over 4 million unique concepts, it reportedly delivers double the accuracy of competing systems on open-vocabulary segmentation tasks. The model is available on GitHub with weights and starter notebooks.

  17. 17
    Article
    Avatar of huggingfaceHugging Face·19w

    Transformers v5: Simple model definitions powering the AI ecosystem

    Hugging Face releases Transformers v5, marking five years since v4 with daily installs growing from 20,000 to 3 million. The library now supports over 400 model architectures and 750,000 community checkpoints. Version 5 focuses on simplicity through modular design, improved training support for both pre-training and fine-tuning, enhanced inference capabilities with continuous batching and a new serving API, and first-class quantization support. The release emphasizes interoperability across the ecosystem, enabling seamless integration with inference engines like vLLM and SGLang, local deployment tools like llama.cpp and MLX, and training frameworks like Unsloth and Axolotl.

  18. 18
    Article
    Avatar of vllmvLLM·17w

    Token-Level Truth: Real-Time Hallucination Detection for Production LLMs

    HaluGate is a real-time hallucination detection system for production LLMs that identifies when models generate claims contradicting provided context. It uses a two-stage pipeline: first classifying whether queries need fact-checking (96.4% accuracy, 12ms latency), then performing token-level detection with NLI explanation for factual queries (76-162ms overhead). Built with ModernBERT and native Rust/Candle integration, it runs without Python dependencies, adding negligible latency compared to LLM generation times. The system integrates with vLLM's Signal-Decision Architecture, exposing results via HTTP headers for downstream policy enforcement. Unlike LLM-as-judge approaches, HaluGate provides explainable, consistent verification specifically for extrinsic hallucinations where tool/RAG context exists.

  19. 19
    Video
    Avatar of siliconversationsSiliconversations·15w

    Why Does The Seahorse Emoji Drive ChatGPT Insane?

    ChatGPT enters an infinite loop when asked about the seahorse emoji because it predicts one should exist but cannot produce it. As a next-word predictor, the model gets stuck repeatedly trying to correct itself. The issue likely stems from Reddit posts in its training data where people falsely remember a seahorse emoji existing (Mandela effect), creating a contradiction between what the model expects to exist and what it can actually output.

  20. 20
    Article
    Avatar of jeffgeerlingJeff Geerling·17w

    1.5 TB of VRAM on Mac Studio - RDMA over Thunderbolt 5

    Testing RDMA over Thunderbolt 5 on a four-Mac Studio cluster with 1.5 TB unified memory shows significant performance gains for running massive AI models. The M3 Ultra Mac Studio outperforms comparable systems from Nvidia and AMD in CPU, AI inference, and power efficiency benchmarks. RDMA support in Exo 1.0 enables linear performance scaling across nodes, achieving 30+ tokens/second on trillion-parameter models. However, limitations include Thunderbolt 5's four-node maximum, macOS cluster management challenges, stability issues with prerelease software, and lack of standard networking options like QSFP for larger deployments.

  21. 21
    Article
    Avatar of hnHacker News·19w

    Tongyi-MAI/Z-Image

    Z-Image is a 6B parameter image generation model featuring three variants: Z-Image-Turbo (distilled for sub-second inference with 8 NFEs on H800 GPUs), Z-Image-Base (foundation model for fine-tuning), and Z-Image-Edit (specialized for image editing). Built on a Scalable Single-Stream DiT architecture, it excels at photorealistic generation, bilingual text rendering (English/Chinese), and instruction following. The model uses Decoupled-DMD distillation algorithm and DMDR (combining DMD with reinforcement learning) for few-step generation optimization. Available on Hugging Face and ModelScope with PyTorch and Diffusers support.

  22. 22
    Article
    Avatar of cloudflareCloudflare·19w

    Why Replicate is joining Cloudflare

    Replicate, a platform for running machine learning models as APIs, has been acquired by Cloudflare. Founded in 2019 to make research models accessible to developers through tools like Cog, Replicate became a key infrastructure provider during the Stable Diffusion era. The acquisition enables integration with Cloudflare's network infrastructure, Workers, R2, and other services to build a comprehensive AI stack. The combined platform aims to support edge model execution, instant-booting Workers for model pipelines, and WebRTC streaming for model inputs and outputs.

  23. 23
    Article
    Avatar of ft0is8acgd90jdhvinkgpValdemar·19w

    OpenAGI launched something interesting - Lux

    OpenAGI released Lux, a foundation AI agent that controls computers through screenshots and action sequences rather than text. It outperforms competing solutions from OpenAI, Google, and Anthropic on real-world tasks (83.6% vs 69% for Gemini CUA), operates faster (~1 second per step), and costs 10× less. Unlike browser-only alternatives, Lux works across desktop applications including Excel, Slack, Adobe products, and IDEs. The model is available via API and SDK, with Intel collaboration underway for local laptop optimization.