Best of Machine Learning — November 2025

1
Article
TechCrunch·26w
Hugging Face CEO says we’re in an ‘LLM bubble,’ not an ‘AI bubble’
Hugging Face CEO Clem Delangue argues the tech industry is experiencing an LLM bubble rather than a broader AI bubble, predicting it may burst soon. He believes the current focus on large, general-purpose language models is misplaced, and that smaller, specialized models will dominate the future for specific use cases like banking chatbots. While competitors spend billions on LLM infrastructure, Hugging Face maintains a capital-efficient approach with half of its $400 million funding still in reserve, positioning itself for long-term sustainability across the diversified AI landscape.
212
25
2
Article
Simon Willison·25w
Olmo 3 is a fully open LLM
Ai2 released Olmo 3, a fully open LLM series that includes complete training data, process, and checkpoints. The flagship 32B Think model emphasizes interpretability with visible reasoning traces through OlmoTrace. Trained on 5.9 trillion tokens from the Dolma 3 Mix dataset (6x fewer tokens than competitors), it offers four 7B variants and two 32B models. The release enables auditing training data to detect potential backdoors, addressing security concerns in open-weight models. Performance testing shows improved SVG generation compared to Olmo 2, though OlmoTrace's training data attribution needs refinement.
84
3
Article
InfoWorld·28w
Perplexity’s open-source tool to run trillion-parameter models without costly upgrades
Perplexity AI released TransferEngine, an open-source tool that enables trillion-parameter language models to run across different cloud providers' GPU hardware at full speed. The software solves vendor lock-in by creating a universal interface for GPU-to-GPU communication that works on both Nvidia ConnectX and AWS EFA networking protocols. This allows companies to run massive models like DeepSeek V3 and Kimi K2 on older H100 and H200 systems instead of purchasing expensive next-generation hardware. TransferEngine achieves 400 Gbps throughput using RDMA technology and is already powering Perplexity's production AI search engine, handling disaggregated inference, reinforcement learning, and Mixture-of-Experts routing.
59
4
Article
Towards Data Science·28w
We Didn’t Invent Attention — We Just Rediscovered It
Attention mechanisms in AI transformers aren't novel inventions but rediscoveries of fundamental optimization principles. The same mathematical pattern—selective amplification combined with normalization—emerges independently across evolution (500+ million years of neural systems), chemistry (autocatalytic reactions), and AI (gradient descent). This convergence suggests attention represents a universal solution to information processing under energy constraints. Reframing attention as amplification rather than selection offers practical insights for improving AI architectures: decoupling amplification from normalization, exploring non-content-based amplification, implementing local normalization pools, and designing systems that operate at critical dynamics for optimal information processing.
58
3
5
Article
The New Stack·27w
Why the Frontend Should Run AI Models Locally With ONNX
Running AI models locally in the browser using ONNX Runtime Web offers significant advantages over cloud-based approaches. Local execution eliminates privacy concerns by keeping sensitive data on-device, enables offline functionality, and provides instant feedback loops. ONNX acts as a universal format for ML models, allowing models trained in PyTorch or TensorFlow to run anywhere via JavaScript. Angular's Signals feature (v16+) provides the performance isolation needed for heavy inference operations. The approach enables mixing local models for low-latency tasks with cloud calls for complex reasoning, while maintaining transparency about data handling.
56
5
6
Article
databricks·28w
Building Custom LLM Judges for AI Agent Accuracy
MLflow introduces three new capabilities for evaluating AI agents: Tunable Judges for creating custom LLM evaluators using natural language instructions, Agent-as-a-Judge for automatically identifying relevant trace data without manual parsing, and Judge Builder for visual judge management with domain expert feedback. These tools enable teams to build domain-specific evaluation criteria, align judges with human feedback through continuous tuning, and scale quality assessment from prototype to production. The make_judge SDK simplifies creating custom judges, while alignment optimization incorporates subject matter expert feedback to improve evaluation accuracy over time.
55
2
7
Article
Aishwary Gupta·25w
OpenAI dropped a cookbook on Self-Evolving Agents
OpenAI released a comprehensive cookbook featuring open-source examples and tutorials for building applications with their API. The collection covers fundamental API usage through advanced implementations including fine-tuning, RAG, function calling, vector databases, multimodal applications, and self-evolving agent development. Practical guides span GPT models, embeddings, image generation, speech processing, and platform integrations.
43
8
Article
Cloudflare·26w
Replicate is joining Cloudflare
Cloudflare acquires Replicate, a platform for running AI models with 50,000+ models in its catalog. The integration will bring Replicate's model catalog and fine-tuning capabilities to Cloudflare's Workers AI platform, while maintaining existing APIs for current users. The combined platform aims to provide serverless GPU inference on Cloudflare's global network, unified model management through AI Gateway, and seamless integration with Cloudflare's developer tools including Workers, R2, Vectorize, and Durable Objects.
42
9
Article
Daily Dose of Data Science | Avi Chawla | Substack·28w
RAG vs. CAG, Explained Visually!
Cache-Augmented Generation (CAG) improves upon traditional RAG by caching static, rarely-changing information directly in the model's key-value memory, while continuing to retrieve dynamic data from vector databases. This hybrid approach reduces redundant fetches, lowers costs, and speeds up inference by separating stable "cold" data (cacheable) from frequently updated "hot" data (retrievable). The technique is already supported by APIs like OpenAI and Anthropic through prompt caching features.
38
2
10
Article
Giant Swarm·27w
Infrastructure for AI is finally getting a standard
The CNCF launched the Kubernetes AI Conformance Program at KubeCon North America, establishing the first standardized baseline for running AI/ML workloads on Kubernetes. Giant Swarm became one of the first platforms to receive certification, addressing the fragmentation in AI infrastructure that has plagued organizations as they move from experimental models to production. The standard defines consistent capabilities, APIs, and configurations needed for reliable AI/ML workloads, with research showing 82% of organizations building custom AI solutions and 58% using Kubernetes. The certification provides teams with confidence in their infrastructure choices, backed by major industry players like Bloomberg, Zalando, OpenAI, NVIDIA, and Apple already using Kubernetes-based platforms for AI workloads.
30
11
Article
Sebastian Raschka·28w
Beyond Standard LLMs
Explores alternatives to standard autoregressive transformer LLMs, including linear attention hybrids like Qwen3-Next and Kimi Linear that use Gated DeltaNet for improved efficiency, text diffusion models that generate tokens in parallel through iterative denoising, code world models that simulate program execution for better code understanding, and small recursive transformers like TRM that refine answers through iterative self-refinement. While traditional transformer LLMs remain state-of-the-art, these alternatives offer promising trade-offs between efficiency and performance for specific use cases.
28
1
12
Article
vLLM·26w
Signal-Decision Driven Architecture: Reshaping Semantic Routing at Scale
vLLM introduces Signal-Decision Architecture, a new approach to semantic routing that replaces fixed classification-based routing with multi-dimensional signal extraction. The architecture combines keyword, embedding, and domain signals with flexible AND/OR logic to enable unlimited routing decisions. It includes built-in plugins for caching, security, and compliance, and uses Kubernetes CRDs for cloud-native deployment. This enables enterprises to scale from 14 fixed categories to hundreds of specialized routing rules with priority-based selection and plugin orchestration.
24
13
Article
Google Open Source Blog·28w
Announcing Magika 1.0: now faster, smarter, and rebuilt in Rust
Google released Magika 1.0, an AI-powered file type detection system completely rewritten in Rust. The stable release doubles file type support to over 200 formats, including specialized types for data science, modern programming languages, and DevOps configurations. The new Rust engine processes hundreds of files per second on a single core using ONNX Runtime and Tokio. Training challenges were addressed using SedPack for handling 3TB datasets and Gemini for generating synthetic samples of rare file types. Available as a native CLI tool and library for Python, TypeScript, and Rust.
24
14
Article
The New Stack·28w
OpenAI Co-Founder: AI Agents Are Still 10 Years Away
OpenAI co-founder Andrej Karpathy predicts AI agents are still a decade away from replacing human workers, despite recent progress with large language models. He argues the industry is over-hyping current capabilities, citing issues like lack of multimodal functionality, continual learning, and the significant demo-to-product gap. Karpathy draws from his experience leading Tesla's self-driving efforts to illustrate how difficult it is to move from working demos to production-ready systems. He's now focusing on AI education through Eureka Labs, releasing projects like nanochat to help developers understand LLM implementation from the ground up.
19
1
15
Article
ecosystem.Ai·27w
The myth of AI automation
Using the historical example of the Mechanical Turk chess automaton from 1770, this piece argues that AI automation is a myth and that human intelligence remains essential for AI systems to function effectively. The narrative challenges common fears about AI replacing human workers, emphasizing that AI tools require human guidance and expertise to deliver meaningful results.
18
5
16
Article
Medium·27w
Kimi K2 Thinking : Best Agentic Reasoning LLM is here, beats GPT5, Sonnet 4.5
Moonshot AI released Kimi K2 Thinking, an open-source LLM that uses test-time scaling to perform extended reasoning chains with up to 300 tool calls per session. Unlike traditional models that scale parameters, K2 scales the number of reasoning steps, maintaining coherence across long chains while integrating web search, code execution, and documentation reading. The model achieves strong results on complex benchmarks like Humanity's Last Exam (44.9%) and SWE-Bench Verified (71.3%) through agentic reasoning. It uses INT4 quantization-aware training for efficiency and offers a Heavy Mode that runs eight parallel reasoning trajectories. K2 represents a shift from word prediction to sustained, tool-augmented cognition.
16
17
Article
ByteByteGo·28w
How Perplexity Built an AI Google
Perplexity AI built an answer engine that combines real-time web search with large language models through a Retrieval-Augmented Generation (RAG) pipeline. The architecture uses Vespa AI for web-scale indexing and retrieval across 200 billion URLs, a model-agnostic orchestration layer that routes queries to appropriate LLMs (both proprietary Sonar models and third-party models like GPT and Claude), and a custom ROSE inference engine running on NVIDIA H100 GPUs. The system processes queries through five stages: intent parsing, live web retrieval, snippet extraction, answer generation with citations, and conversational refinement. This approach addresses AI hallucination issues by grounding responses in verifiable sources while maintaining low latency and cost efficiency through intelligent model routing and infrastructure optimization.
16
1
18
Article
GitHub Blog·25w
Why developers still flock to Python: Guido van Rossum on readability, AI, and the future of programming
Python creator Guido van Rossum discusses why Python grew 49% year-over-year in 2025 despite TypeScript overtaking it as GitHub's most-used language. He explains Python's origins as a practical tool between C's complexity and shell scripting limitations, its dominance in AI through ecosystem gravity (NumPy, PyTorch, pandas), and why the language doesn't need stricter typing for LLMs. Van Rossum emphasizes Python's core strengths: readability, approachability, and backward compatibility through features like soft keywords. He highlights how Python democratizes programming for non-CS backgrounds and remains the default language for AI, science, and education globally.
15
19
Article
TechCentral·28w
China’s DeepSeek warns of social upheaval from AI
DeepSeek's senior researcher Chen Deli made a rare public appearance at China's World Internet Conference, expressing concerns about AI's long-term societal impact. While optimistic about the technology itself, Chen warned that AI could threaten widespread job displacement within 5-10 years and create massive social challenges in 10-20 years. DeepSeek gained global attention in January for releasing a low-cost AI model that outperformed leading US models. The company recently upgraded its V3 model in September and has become central to China's efforts to build a domestic AI ecosystem, with Chinese chip makers like Cambricon and Huawei developing hardware compatible with DeepSeek's models.
15
20
Article
System Design Newsletter·28w
A Beginner’s Field Guide to Large Language Models: From Tokens to Agents
Comprehensive beginner's guide explaining 33 fundamental LLM concepts without mathematics. Covers core mechanics like tokens, embeddings, and parameters; training processes including pre-training and fine-tuning; interaction patterns through prompts and context windows; architectural extensions like RAG and agentic AI; model types and deployment options; performance measurement through benchmarks and metrics; and common failure modes like hallucination and bias with their mitigation strategies. Emphasizes practical understanding over technical depth to help readers use LLMs effectively and recognize their limitations.
15
21
Video
Theo - t3․gg·25w
NVIDIA's first real competition (Google is KILLING it)
Google announced its seventh-generation TPU (Ironwood), claiming 10x performance improvements over previous versions and positioning itself as a serious competitor to Nvidia in AI accelerator hardware. Meta is reportedly in talks with Google for a multi-billion dollar chip deal starting in 2027, causing Nvidia's stock to drop 4% and wiping $112 billion off its market cap. Google is the only major tech company operating across all AI layers: applications (Google Search), foundation models (Gemini), cloud inference (GCP), and custom accelerator hardware (TPUs). The move represents a strategic shift as companies seek alternatives to Nvidia's dominant position and pricing in the GPU market, with Google leveraging its vertical integration and custom silicon expertise to challenge the status quo.
13
22
Article
Javarevisited·28w
Top 9 Books to Learn RAG and AI Agents in 2025
A curated collection of 9 technical books for learning Retrieval-Augmented Generation (RAG) and AI agent development. Covers foundational topics like data engineering, statistics, and NLP transformers, then progresses to production-focused ML system design, LLM engineering, and frameworks like LangChain. Emphasizes practical, production-ready knowledge from industry experts including Chip Huyen's works on ML system design and AI engineering, alongside hands-on guides for building and deploying LLM-powered applications.
13
23
Article
freeCodeCamp·27w
How To Run an Open-Source LLM on Your Personal Computer – Run Ollama Locally
Learn how to run open-source large language models like Llama, Mistral, and Gemma locally on your personal computer using Ollama. The guide covers installation through both GUI and command-line interfaces, explains how to download and manage models, integrate them into applications via local API endpoints, and troubleshoot common issues. Running LLMs locally provides privacy, offline functionality, and eliminates cloud API costs while giving developers full control over AI capabilities.
11
24
Article
freeCodeCamp·27w
Learn Discrete Mathematics
A comprehensive 9-hour course covering discrete mathematics fundamentals essential for computer science, including combinatorics, number theory, prime numbers, graph theory, and their applications in machine learning and algorithms. The course includes practical Python implementations using itertools, explores key concepts like permutations, binomial coefficients, modular arithmetic, and advanced topics such as Stirling numbers and the Chinese remainder theorem.
10

See all Machine Learning archives