Best of Machine LearningNovember 2025

  1. 1
    Article
    Avatar of tcTechCrunch·21w

    Hugging Face CEO says we’re in an ‘LLM bubble,’ not an ‘AI bubble’

    Hugging Face CEO Clem Delangue argues the tech industry is experiencing an LLM bubble rather than a broader AI bubble, predicting it may burst soon. He believes the current focus on large, general-purpose language models is misplaced, and that smaller, specialized models will dominate the future for specific use cases like banking chatbots. While competitors spend billions on LLM infrastructure, Hugging Face maintains a capital-efficient approach with half of its $400 million funding still in reserve, positioning itself for long-term sustainability across the diversified AI landscape.

  2. 2
    Article
    Avatar of simonwillisonSimon Willison·21w

    Olmo 3 is a fully open LLM

    Ai2 released Olmo 3, a fully open LLM series that includes complete training data, process, and checkpoints. The flagship 32B Think model emphasizes interpretability with visible reasoning traces through OlmoTrace. Trained on 5.9 trillion tokens from the Dolma 3 Mix dataset (6x fewer tokens than competitors), it offers four 7B variants and two 32B models. The release enables auditing training data to detect potential backdoors, addressing security concerns in open-weight models. Performance testing shows improved SVG generation compared to Olmo 2, though OlmoTrace's training data attribution needs refinement.

  3. 3
    Article
    Avatar of infoworldInfoWorld·23w

    Perplexity’s open-source tool to run trillion-parameter models without costly upgrades

    Perplexity AI released TransferEngine, an open-source tool that enables trillion-parameter language models to run across different cloud providers' GPU hardware at full speed. The software solves vendor lock-in by creating a universal interface for GPU-to-GPU communication that works on both Nvidia ConnectX and AWS EFA networking protocols. This allows companies to run massive models like DeepSeek V3 and Kimi K2 on older H100 and H200 systems instead of purchasing expensive next-generation hardware. TransferEngine achieves 400 Gbps throughput using RDMA technology and is already powering Perplexity's production AI search engine, handling disaggregated inference, reinforcement learning, and Mixture-of-Experts routing.

  4. 4
    Article
    Avatar of tdsTowards Data Science·23w

    We Didn’t Invent Attention — We Just Rediscovered It

    Attention mechanisms in AI transformers aren't novel inventions but rediscoveries of fundamental optimization principles. The same mathematical pattern—selective amplification combined with normalization—emerges independently across evolution (500+ million years of neural systems), chemistry (autocatalytic reactions), and AI (gradient descent). This convergence suggests attention represents a universal solution to information processing under energy constraints. Reframing attention as amplification rather than selection offers practical insights for improving AI architectures: decoupling amplification from normalization, exploring non-content-based amplification, implementing local normalization pools, and designing systems that operate at critical dynamics for optimal information processing.

  5. 5
    Article
    Avatar of newstackThe New Stack·22w

    Why the Frontend Should Run AI Models Locally With ONNX

    Running AI models locally in the browser using ONNX Runtime Web offers significant advantages over cloud-based approaches. Local execution eliminates privacy concerns by keeping sensitive data on-device, enables offline functionality, and provides instant feedback loops. ONNX acts as a universal format for ML models, allowing models trained in PyTorch or TensorFlow to run anywhere via JavaScript. Angular's Signals feature (v16+) provides the performance isolation needed for heavy inference operations. The approach enables mixing local models for low-latency tasks with cloud calls for complex reasoning, while maintaining transparency about data handling.

  6. 6
    Article
    Avatar of databricksdatabricks·23w

    Building Custom LLM Judges for AI Agent Accuracy

    MLflow introduces three new capabilities for evaluating AI agents: Tunable Judges for creating custom LLM evaluators using natural language instructions, Agent-as-a-Judge for automatically identifying relevant trace data without manual parsing, and Judge Builder for visual judge management with domain expert feedback. These tools enable teams to build domain-specific evaluation criteria, align judges with human feedback through continuous tuning, and scale quality assessment from prototype to production. The make_judge SDK simplifies creating custom judges, while alignment optimization incorporates subject matter expert feedback to improve evaluation accuracy over time.

  7. 7
    Article
    Avatar of bhsp8lwj2nc2bnkkiyg3zAishwary Gupta·21w

    OpenAI dropped a cookbook on Self-Evolving Agents

    OpenAI released a comprehensive cookbook featuring open-source examples and tutorials for building applications with their API. The collection covers fundamental API usage through advanced implementations including fine-tuning, RAG, function calling, vector databases, multimodal applications, and self-evolving agent development. Practical guides span GPT models, embeddings, image generation, speech processing, and platform integrations.

  8. 8
    Article
    Avatar of cloudflareCloudflare·21w

    Replicate is joining Cloudflare

    Cloudflare acquires Replicate, a platform for running AI models with 50,000+ models in its catalog. The integration will bring Replicate's model catalog and fine-tuning capabilities to Cloudflare's Workers AI platform, while maintaining existing APIs for current users. The combined platform aims to provide serverless GPU inference on Cloudflare's global network, unified model management through AI Gateway, and seamless integration with Cloudflare's developer tools including Workers, R2, Vectorize, and Durable Objects.

  9. 9
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·23w

    RAG vs. CAG, Explained Visually!

    Cache-Augmented Generation (CAG) improves upon traditional RAG by caching static, rarely-changing information directly in the model's key-value memory, while continuing to retrieve dynamic data from vector databases. This hybrid approach reduces redundant fetches, lowers costs, and speeds up inference by separating stable "cold" data (cacheable) from frequently updated "hot" data (retrievable). The technique is already supported by APIs like OpenAI and Anthropic through prompt caching features.

  10. 10
    Article
    Avatar of giantswarmGiant Swarm·22w

    Infrastructure for AI is finally getting a standard

    The CNCF launched the Kubernetes AI Conformance Program at KubeCon North America, establishing the first standardized baseline for running AI/ML workloads on Kubernetes. Giant Swarm became one of the first platforms to receive certification, addressing the fragmentation in AI infrastructure that has plagued organizations as they move from experimental models to production. The standard defines consistent capabilities, APIs, and configurations needed for reliable AI/ML workloads, with research showing 82% of organizations building custom AI solutions and 58% using Kubernetes. The certification provides teams with confidence in their infrastructure choices, backed by major industry players like Bloomberg, Zalando, OpenAI, NVIDIA, and Apple already using Kubernetes-based platforms for AI workloads.

  11. 11
    Article
    Avatar of sebastianraschkaSebastian Raschka·23w

    Beyond Standard LLMs

    Explores alternatives to standard autoregressive transformer LLMs, including linear attention hybrids like Qwen3-Next and Kimi Linear that use Gated DeltaNet for improved efficiency, text diffusion models that generate tokens in parallel through iterative denoising, code world models that simulate program execution for better code understanding, and small recursive transformers like TRM that refine answers through iterative self-refinement. While traditional transformer LLMs remain state-of-the-art, these alternatives offer promising trade-offs between efficiency and performance for specific use cases.

  12. 12
    Article
    Avatar of vllmvLLM·21w

    Signal-Decision Driven Architecture: Reshaping Semantic Routing at Scale

    vLLM introduces Signal-Decision Architecture, a new approach to semantic routing that replaces fixed classification-based routing with multi-dimensional signal extraction. The architecture combines keyword, embedding, and domain signals with flexible AND/OR logic to enable unlimited routing decisions. It includes built-in plugins for caching, security, and compliance, and uses Kubernetes CRDs for cloud-native deployment. This enables enterprises to scale from 14 fixed categories to hundreds of specialized routing rules with priority-based selection and plugin orchestration.

  13. 13
    Article
    Avatar of googleossGoogle Open Source Blog·23w

    Announcing Magika 1.0: now faster, smarter, and rebuilt in Rust

    Google released Magika 1.0, an AI-powered file type detection system completely rewritten in Rust. The stable release doubles file type support to over 200 formats, including specialized types for data science, modern programming languages, and DevOps configurations. The new Rust engine processes hundreds of files per second on a single core using ONNX Runtime and Tokio. Training challenges were addressed using SedPack for handling 3TB datasets and Gemini for generating synthetic samples of rare file types. Available as a native CLI tool and library for Python, TypeScript, and Rust.

  14. 14
    Article
    Avatar of newstackThe New Stack·23w

    OpenAI Co-Founder: AI Agents Are Still 10 Years Away

    OpenAI co-founder Andrej Karpathy predicts AI agents are still a decade away from replacing human workers, despite recent progress with large language models. He argues the industry is over-hyping current capabilities, citing issues like lack of multimodal functionality, continual learning, and the significant demo-to-product gap. Karpathy draws from his experience leading Tesla's self-driving efforts to illustrate how difficult it is to move from working demos to production-ready systems. He's now focusing on AI education through Eureka Labs, releasing projects like nanochat to help developers understand LLM implementation from the ground up.

  15. 15
    Article
    Avatar of ecosystemaiecosystem.Ai·22w

    The myth of AI automation

    Using the historical example of the Mechanical Turk chess automaton from 1770, this piece argues that AI automation is a myth and that human intelligence remains essential for AI systems to function effectively. The narrative challenges common fears about AI replacing human workers, emphasizing that AI tools require human guidance and expertise to deliver meaningful results.

  16. 16
    Article
    Avatar of medium_jsMedium·22w

    Kimi K2 Thinking : Best Agentic Reasoning LLM is here, beats GPT5, Sonnet 4.5

    Moonshot AI released Kimi K2 Thinking, an open-source LLM that uses test-time scaling to perform extended reasoning chains with up to 300 tool calls per session. Unlike traditional models that scale parameters, K2 scales the number of reasoning steps, maintaining coherence across long chains while integrating web search, code execution, and documentation reading. The model achieves strong results on complex benchmarks like Humanity's Last Exam (44.9%) and SWE-Bench Verified (71.3%) through agentic reasoning. It uses INT4 quantization-aware training for efficiency and offers a Heavy Mode that runs eight parallel reasoning trajectories. K2 represents a shift from word prediction to sustained, tool-augmented cognition.

  17. 17
    Article
    Avatar of bytebytegoByteByteGo·23w

    How Perplexity Built an AI Google

    Perplexity AI built an answer engine that combines real-time web search with large language models through a Retrieval-Augmented Generation (RAG) pipeline. The architecture uses Vespa AI for web-scale indexing and retrieval across 200 billion URLs, a model-agnostic orchestration layer that routes queries to appropriate LLMs (both proprietary Sonar models and third-party models like GPT and Claude), and a custom ROSE inference engine running on NVIDIA H100 GPUs. The system processes queries through five stages: intent parsing, live web retrieval, snippet extraction, answer generation with citations, and conversational refinement. This approach addresses AI hallucination issues by grounding responses in verifiable sources while maintaining low latency and cost efficiency through intelligent model routing and infrastructure optimization.

  18. 18
    Article
    Avatar of ghblogGitHub Blog·20w

    Why developers still flock to Python: Guido van Rossum on readability, AI, and the future of programming

    Python creator Guido van Rossum discusses why Python grew 49% year-over-year in 2025 despite TypeScript overtaking it as GitHub's most-used language. He explains Python's origins as a practical tool between C's complexity and shell scripting limitations, its dominance in AI through ecosystem gravity (NumPy, PyTorch, pandas), and why the language doesn't need stricter typing for LLMs. Van Rossum emphasizes Python's core strengths: readability, approachability, and backward compatibility through features like soft keywords. He highlights how Python democratizes programming for non-CS backgrounds and remains the default language for AI, science, and education globally.

  19. 19
    Article
    Avatar of techcentralTechCentral·23w

    China’s DeepSeek warns of social upheaval from AI

    DeepSeek's senior researcher Chen Deli made a rare public appearance at China's World Internet Conference, expressing concerns about AI's long-term societal impact. While optimistic about the technology itself, Chen warned that AI could threaten widespread job displacement within 5-10 years and create massive social challenges in 10-20 years. DeepSeek gained global attention in January for releasing a low-cost AI model that outperformed leading US models. The company recently upgraded its V3 model in September and has become central to China's efforts to build a domestic AI ecosystem, with Chinese chip makers like Cambricon and Huawei developing hardware compatible with DeepSeek's models.

  20. 20
    Article
    Avatar of systemdesignnewsSystem Design Newsletter·23w

    A Beginner’s Field Guide to Large Language Models: From Tokens to Agents

    Comprehensive beginner's guide explaining 33 fundamental LLM concepts without mathematics. Covers core mechanics like tokens, embeddings, and parameters; training processes including pre-training and fine-tuning; interaction patterns through prompts and context windows; architectural extensions like RAG and agentic AI; model types and deployment options; performance measurement through benchmarks and metrics; and common failure modes like hallucination and bias with their mitigation strategies. Emphasizes practical understanding over technical depth to help readers use LLMs effectively and recognize their limitations.

  21. 21
    Video
    Avatar of t3dotggTheo - t3․gg·20w

    NVIDIA's first real competition (Google is KILLING it)

    Google announced its seventh-generation TPU (Ironwood), claiming 10x performance improvements over previous versions and positioning itself as a serious competitor to Nvidia in AI accelerator hardware. Meta is reportedly in talks with Google for a multi-billion dollar chip deal starting in 2027, causing Nvidia's stock to drop 4% and wiping $112 billion off its market cap. Google is the only major tech company operating across all AI layers: applications (Google Search), foundation models (Gemini), cloud inference (GCP), and custom accelerator hardware (TPUs). The move represents a strategic shift as companies seek alternatives to Nvidia's dominant position and pricing in the GPU market, with Google leveraging its vertical integration and custom silicon expertise to challenge the status quo.

  22. 22
    Article
    Avatar of javarevisitedJavarevisited·23w

    Top 9 Books to Learn RAG and AI Agents in 2025

    A curated collection of 9 technical books for learning Retrieval-Augmented Generation (RAG) and AI agent development. Covers foundational topics like data engineering, statistics, and NLP transformers, then progresses to production-focused ML system design, LLM engineering, and frameworks like LangChain. Emphasizes practical, production-ready knowledge from industry experts including Chip Huyen's works on ML system design and AI engineering, alongside hands-on guides for building and deploying LLM-powered applications.

  23. 23
    Article
    Avatar of freecodecampfreeCodeCamp·22w

    How To Run an Open-Source LLM on Your Personal Computer – Run Ollama Locally

    Learn how to run open-source large language models like Llama, Mistral, and Gemma locally on your personal computer using Ollama. The guide covers installation through both GUI and command-line interfaces, explains how to download and manage models, integrate them into applications via local API endpoints, and troubleshoot common issues. Running LLMs locally provides privacy, offline functionality, and eliminates cloud API costs while giving developers full control over AI capabilities.

  24. 24
    Article
    Avatar of freecodecampfreeCodeCamp·22w

    Learn Discrete Mathematics

    A comprehensive 9-hour course covering discrete mathematics fundamentals essential for computer science, including combinatorics, number theory, prime numbers, graph theory, and their applications in machine learning and algorithms. The course includes practical Python implementations using itertools, explores key concepts like permutations, binomial coefficients, modular arithmetic, and advanced topics such as Stirling numbers and the Chinese remainder theorem.