Best of LLM — November 2025

1
Article
TechCrunch·27w
Hugging Face CEO says we’re in an ‘LLM bubble,’ not an ‘AI bubble’
Hugging Face CEO Clem Delangue argues the tech industry is experiencing an LLM bubble rather than a broader AI bubble, predicting it may burst soon. He believes the current focus on large, general-purpose language models is misplaced, and that smaller, specialized models will dominate the future for specific use cases like banking chatbots. While competitors spend billions on LLM infrastructure, Hugging Face maintains a capital-efficient approach with half of its $400 million funding still in reserve, positioning itself for long-term sustainability across the diversified AI landscape.
212
25
2
Article
Sebastian Raschka·28w
Recommendations for Getting the Most Out of a Technical Book
A structured five-step approach to learning from technical books: start with an offline read-through to grasp the big picture, follow with hands-on coding by retyping examples, complete exercises to solidify understanding, review notes and explore additional resources, and finally apply concepts in personal projects. The method emphasizes focused reading sessions, active engagement with code, and practical application over passive consumption.
141
7
3
Article
David Heinemeier Hansson·26w
Local LLMs are how nerds now justify a big computer they don't need
Local LLMs, while technically impressive, still lag significantly behind cloud-based frontier models for practical development work. Despite the hype around running AI models locally, most developers don't actually need expensive high-RAM machines. Budget mini PCs costing around $500 can handle typical development tasks just as well as premium $2,000+ workstations, especially when running Linux. This is fortunate timing given the current spike in RAM prices driven by AI's resource demands.
101
32
4
Article
Simon Willison·26w
Olmo 3 is a fully open LLM
Ai2 released Olmo 3, a fully open LLM series that includes complete training data, process, and checkpoints. The flagship 32B Think model emphasizes interpretability with visible reasoning traces through OlmoTrace. Trained on 5.9 trillion tokens from the Dolma 3 Mix dataset (6x fewer tokens than competitors), it offers four 7B variants and two 32B models. The release enables auditing training data to detect potential backdoors, addressing security concerns in open-weight models. Performance testing shows improved SVG generation compared to Olmo 2, though OlmoTrace's training data attribution needs refinement.
84
5
Article
The Developing Dev·27w
What Would You Automate if It Was Free?
LLM-powered code generation tools have made automation practically free, changing the cost-benefit calculation for repetitive tasks. The author shares practical examples of using AI to generate scripts for podcast transcript processing, video file stitching with ffmpeg, and converting notes to Markdown. These tools enable automation of even one-off tasks that previously weren't worth the manual effort, fundamentally changing how developers approach small, repetitive work.
74
6
6
Article
databricks·29w
Building Custom LLM Judges for AI Agent Accuracy
MLflow introduces three new capabilities for evaluating AI agents: Tunable Judges for creating custom LLM evaluators using natural language instructions, Agent-as-a-Judge for automatically identifying relevant trace data without manual parsing, and Judge Builder for visual judge management with domain expert feedback. These tools enable teams to build domain-specific evaluation criteria, align judges with human feedback through continuous tuning, and scale quality assessment from prototype to production. The make_judge SDK simplifies creating custom judges, while alignment optimization incorporates subject matter expert feedback to improve evaluation accuracy over time.
55
2
7
Article
ByteByteGo·28w
How Uber Built a Conversational AI Agent For Financial Analysis
Uber built Finch, a conversational AI agent that enables finance teams to query financial data using natural language directly in Slack. The system translates questions into SQL queries, retrieves data from curated single-table data marts, and returns results in seconds. Finch uses a modular architecture with specialized agents orchestrated by LangGraph, OpenSearch for semantic mapping, and role-based access controls for security. The system includes continuous evaluation against golden queries, performance optimizations through parallel processing and pre-fetching, and plans to expand with deeper FinTech integration and human-in-the-loop validation for executive decisions.
52
8
Article
Barion·26w
I have never seen good AI code (challenge)
A developer challenges the community to prove that LLM-generated code can meet professional standards for production use. They're specifically looking for examples that demonstrate good balance between readability, maintainability, reusability, simplicity, and performance, with preference for immutable and declarative design patterns. The author uses AI daily but claims to have never encountered AI-generated code worthy of inclusion in long-term projects.
44
42
9
Article
Valdemar·28w
Found a wild tool - Everywhere
Everywhere is a desktop AI assistant that runs locally and integrates with multiple LLM providers including GPT, Claude, Gemini, and DeepSeek. It offers context-aware features like file reading, page summarization, code assistance, and trip planning directly from your desktop environment.
44
8
10
Article
Aishwary Gupta·26w
OpenAI dropped a cookbook on Self-Evolving Agents
OpenAI released a comprehensive cookbook featuring open-source examples and tutorials for building applications with their API. The collection covers fundamental API usage through advanced implementations including fine-tuning, RAG, function calling, vector databases, multimodal applications, and self-evolving agent development. Practical guides span GPT models, embeddings, image generation, speech processing, and platform integrations.
43
11
Video
Theo - t3․gg·27w
Anthropic admits that MCP sucks
Anthropic published guidance showing that code execution is 98.7% more efficient than their Model Context Protocol (MCP) specification for AI agents. The article demonstrates how writing code to interact with MCP servers reduces token usage from 150,000 to 2,000 tokens by avoiding context window bloat from tool definitions and intermediate results. This approach enables on-demand tool loading, data filtering before reaching the model, and better privacy controls, though it requires secure sandboxed execution environments.
38
2
12
Article
Daily Dose of Data Science | Avi Chawla | Substack·29w
RAG vs. CAG, Explained Visually!
Cache-Augmented Generation (CAG) improves upon traditional RAG by caching static, rarely-changing information directly in the model's key-value memory, while continuing to retrieve dynamic data from vector databases. This hybrid approach reduces redundant fetches, lowers costs, and speeds up inference by separating stable "cold" data (cacheable) from frequently updated "hot" data (retrievable). The technique is already supported by APIs like OpenAI and Anthropic through prompt caching features.
38
2
13
Article
Red Hat Developer·29w
3 MCP servers you should be using (safely)
Model Context Protocol (MCP) enables AI models to interact with developer tools and services through standardized servers. Three essential MCP servers are highlighted: Kubernetes for cluster management and diagnostics, Context7 for accessing up-to-date technical documentation, and GitHub for repository interactions. Each server requires careful security configuration, including read-only defaults, human approval for write operations, and minimal access tokens to prevent data exfiltration through prompt injection attacks.
37
3
14
Article
Vespa Blog·28w
LLMs, Vespa, and a side of Summer Debugging
Two interns built an MCP (Model Context Protocol) server for Vespa, starting with a Python prototype using PyVespa and the MCP SDK, then rewrote it in Java for full container integration. The server enables LLMs to query Vespa applications using natural language through three main tools: schema retrieval, documentation search, and query execution. Key challenges included implementing custom request handlers, managing stateless communication across distributed nodes, and adapting to frequent MCP SDK updates. The final implementation runs as a Vespa component with HTTP transport, providing read-only access to applications both locally and in the cloud.
29
15
Article
Product Hunt·29w
Helicone AI: Open-source LLM Observability for Developers
Helicone is an open-source platform that provides observability and monitoring for AI applications using large language models. It offers a unified API gateway that consolidates access to 100+ models from multiple providers through a single API key, with zero markup fees. Key features include automatic failover, built-in caching, custom rate limits, real-time analytics, and OpenAI SDK compatibility. The platform addresses common challenges like provider outages, rate limiting, and managing multiple API integrations while providing full visibility into performance and costs.
28
16
Article
Sebastian Raschka·29w
Beyond Standard LLMs
Explores alternatives to standard autoregressive transformer LLMs, including linear attention hybrids like Qwen3-Next and Kimi Linear that use Gated DeltaNet for improved efficiency, text diffusion models that generate tokens in parallel through iterative denoising, code world models that simulate program execution for better code understanding, and small recursive transformers like TRM that refine answers through iterative self-refinement. While traditional transformer LLMs remain state-of-the-art, these alternatives offer promising trade-offs between efficiency and performance for specific use cases.
28
1
17
Article
vLLM·27w
Signal-Decision Driven Architecture: Reshaping Semantic Routing at Scale
vLLM introduces Signal-Decision Architecture, a new approach to semantic routing that replaces fixed classification-based routing with multi-dimensional signal extraction. The architecture combines keyword, embedding, and domain signals with flexible AND/OR logic to enable unlimited routing decisions. It includes built-in plugins for caching, security, and compliance, and uses Kubernetes CRDs for cloud-native deployment. This enables enterprises to scale from 14 fixed categories to hundreds of specialized routing rules with priority-based selection and plugin orchestration.
24
18
Article
Medium·27w
Rich and dynamic user interfaces with Flutter and generative UI
Flutter introduces GenUI SDK in alpha, enabling developers to build dynamic, personalized user interfaces generated by AI models like Gemini. The SDK orchestrates communication between users, Flutter widgets, and AI agents to transform text conversations into interactive experiences. It uses the A2UI protocol for serialization and supports multiple content generators including Google Gemini API, Firebase AI, and custom adapters. The SDK maintains brand consistency while allowing AI to compose UI from custom widget catalogs. Future plans include ADK integration, progressive rendering, full-screen composition, and Dart bytecode for server-driven UI.
23
1
19
Article
Hacker News·29w
samrolken/nokode
An experimental web server that replaces all application logic with an LLM making real-time decisions. Instead of writing code for routes, controllers, or business logic, the system gives an AI three tools (database queries, HTTP responses, and memory updates) and lets it handle every request dynamically. The proof-of-concept contact manager works but runs 300-6000x slower and costs 100-1000x more than traditional apps, with consistency issues from lack of design memory. Despite severe performance limitations, the experiment demonstrates that LLMs can successfully handle full application logic, suggesting current bottlenecks are engineering challenges rather than fundamental capability gaps.
20
3
20
Article
Daily Dose of Data Science | Avi Chawla | Substack·28w
Agent Protocol Landscape
Three emerging protocols are standardizing the fragmented AI agent ecosystem: AG-UI for agent-user interaction in frontends, MCP (Model Context Protocol) for connecting agents to tools and data, and A2A for multi-agent coordination. These protocols work as complementary layers rather than competing standards, with frameworks like CopilotKit providing a unified interface to build with all three. The convergence enables seamless integration between agentic backends, frontends, tools, and multi-agent systems through open-source implementations.
19
21
Article
The New Stack·29w
OpenAI Co-Founder: AI Agents Are Still 10 Years Away
OpenAI co-founder Andrej Karpathy predicts AI agents are still a decade away from replacing human workers, despite recent progress with large language models. He argues the industry is over-hyping current capabilities, citing issues like lack of multimodal functionality, continual learning, and the significant demo-to-product gap. Karpathy draws from his experience leading Tesla's self-driving efforts to illustrate how difficult it is to move from working demos to production-ready systems. He's now focusing on AI education through Eureka Labs, releasing projects like nanochat to help developers understand LLM implementation from the ground up.
19
1
22
Video
Theo - t3․gg·28w
GPT-5.1 is built for normies
GPT-5.1 represents a shift toward consumer-focused AI with improved conversational tone, customizable personalities, and enhanced safety guardrails. The release prioritizes warmth and accessibility over developer features, with API access delayed. Testing reveals better mental health safeguards and reduced sycophancy compared to GPT-4, though the model's personality options and emoji-heavy responses may not appeal to technical users. The instant variant shows adaptive reasoning that adjusts token usage based on query complexity, while safety evaluations demonstrate meaningful improvements in handling sensitive content.
18
1
23
Article
Medium·28w
Kimi K2 Thinking : Best Agentic Reasoning LLM is here, beats GPT5, Sonnet 4.5
Moonshot AI released Kimi K2 Thinking, an open-source LLM that uses test-time scaling to perform extended reasoning chains with up to 300 tool calls per session. Unlike traditional models that scale parameters, K2 scales the number of reasoning steps, maintaining coherence across long chains while integrating web search, code execution, and documentation reading. The model achieves strong results on complex benchmarks like Humanity's Last Exam (44.9%) and SWE-Bench Verified (71.3%) through agentic reasoning. It uses INT4 quantization-aware training for efficiency and offers a Heavy Mode that runs eight parallel reasoning trajectories. K2 represents a shift from word prediction to sustained, tool-augmented cognition.
16
24
Article
ByteByteGo·29w
How Perplexity Built an AI Google
Perplexity AI built an answer engine that combines real-time web search with large language models through a Retrieval-Augmented Generation (RAG) pipeline. The architecture uses Vespa AI for web-scale indexing and retrieval across 200 billion URLs, a model-agnostic orchestration layer that routes queries to appropriate LLMs (both proprietary Sonar models and third-party models like GPT and Claude), and a custom ROSE inference engine running on NVIDIA H100 GPUs. The system processes queries through five stages: intent parsing, live web retrieval, snippet extraction, answer generation with citations, and conversational refinement. This approach addresses AI hallucination issues by grounding responses in verifiable sources while maintaining low latency and cost efficiency through intelligent model routing and infrastructure optimization.
16
1
25
Article
System Design Newsletter·29w
A Beginner’s Field Guide to Large Language Models: From Tokens to Agents
Comprehensive beginner's guide explaining 33 fundamental LLM concepts without mathematics. Covers core mechanics like tokens, embeddings, and parameters; training processes including pre-training and fine-tuning; interaction patterns through prompts and context windows; architectural extensions like RAG and agentic AI; model types and deployment options; performance measurement through benchmarks and metrics; and common failure modes like hallucination and bias with their mitigation strategies. Emphasizes practical understanding over technical depth to help readers use LLMs effectively and recognize their limitations.
15

See all LLM archives