Best of LLM — December 2025

1
Article
Elevate·23w
My LLM coding workflow going into 2026
A comprehensive guide to using LLM coding assistants effectively in 2026. Key practices include starting with detailed specifications before coding, breaking work into small iterative chunks, providing extensive context to the AI, choosing appropriate models for different tasks, maintaining human oversight through testing and code review, committing frequently for version control safety, customizing AI behavior with rules and examples, leveraging automation as quality gates, and treating AI as a force multiplier rather than replacement. The workflow emphasizes treating LLMs as junior pair programmers requiring guidance while maintaining developer accountability for all code produced.
176
5
2
Article
Sebastian Raschka·21w
The State Of LLMs 2025: Progress, Problems, and Predictions
A comprehensive 2025 review of large language model developments highlights reinforcement learning with verifiable rewards (RLVR) and the GRPO algorithm as the year's dominant training paradigm, following DeepSeek R1's breakthrough. Key trends include inference-time scaling, tool use integration, and architectural efficiency tweaks like mixture-of-experts and linear attention mechanisms. The analysis addresses benchmarking challenges ("benchmaxxing"), discusses practical LLM usage for coding and writing, and examines the shift toward domain-specific models with proprietary data. Predictions for 2026 emphasize RLVR expansion beyond math/code, increased inference optimization, and the emergence of diffusion models for low-latency tasks.
176
1
3
Article
LangChain·24w
Agent Engineering: A New Discipline
Agent engineering is an iterative discipline for building reliable LLM-based agents in production. It combines product thinking (prompt writing, defining scope), engineering (building tools, infrastructure, UI), and data science (evaluation, monitoring, analysis) in a continuous cycle of build, test, ship, observe, and refine. Unlike traditional software, agents handle unpredictable natural language inputs and non-deterministic behavior, making production deployment essential for learning what actually works. Successful teams treat shipping as a learning mechanism rather than an end goal, using tracing and evaluation to systematically improve agent reliability through rapid iteration.
160
4
Article
Daily Dose of Data Science | Avi Chawla | Substack·22w
The AI Engineering Guidebook
A comprehensive 350+ page guidebook covering the engineering fundamentals of LLM systems, including model architecture, training, prompt engineering, RAG systems, fine-tuning techniques like LoRA, AI agents, Model Context Protocol, optimization strategies, and deployment considerations. The resource focuses on practical engineering decisions, system design tradeoffs, and real-world implementation patterns rather than surface-level usage.
125
5
5
Article
Simon Willison·23w
JustHTML is a fascinating example of vibe engineering in action
JustHTML is a pure Python HTML5 parser that passes all 9,200+ browser vendor tests and achieves 100% test coverage. The library was built over several months using AI coding agents (Claude Sonnet, Gemini Pro, Claude Opus) in VS Code, but with extensive human engineering oversight. The developer established the API design, integrated comprehensive test suites, built custom profilers and fuzzers, and made all architectural decisions while letting the AI handle code implementation. This represents "vibe engineering"—using AI agents professionally with proper code review, testing, and engineering practices—rather than "vibe coding" which produces unvetted prototypes. The project demonstrates how experienced engineers can leverage AI as a typing assistant while maintaining responsibility for design, quality, and architectural decisions.
122
9
6
Article
Simon Willison·23w
Your job is to deliver code you have proven to work
Software engineers must deliver proven, working code rather than untested contributions. This requires both manual testing (seeing the code work yourself, documenting steps, testing edge cases) and automated testing (bundling tests with changes). With AI coding agents like Claude Code, developers should train these tools to prove their changes work through testing before submission. The human developer remains accountable for ensuring code quality and providing evidence that changes function correctly.
111
11
7
Article
Read the Tea Leaves·22w
How I use AI agents to write code
A developer shares practical strategies for using AI coding agents effectively after transitioning from skepticism to adoption. Key recommendations include creating comprehensive CLAUDE.md files for project context, using automated tests as feedback loops, running separate AI sessions for code review to catch bugs, and leveraging agents for overnight work on side projects. The author acknowledges AI's limitations with UI work and novel projects, describes the shift toward an architect-like role focused on specs and review, but maintains reservations about using AI for open-source contributions due to ownership concerns.
99
3
8
Article
DEV·21w
I Built a Tool to Stop Wasting Time on Toxic Open Source Projects
A developer built repo-health, a tool that analyzes GitHub repositories to help contributors identify healthy open source projects and avoid toxic ones. The system uses a hybrid scoring approach combining weighted metrics (activity, maintenance, community, documentation) with LLM-based adjustments to account for context like feature-complete projects. Key features include PR metrics analysis, contributor retention visualization, intelligent issue analysis with difficulty scoring, activity pattern detection for spam, and file-issue mapping. The author shares technical implementation details, bug fixes (cache security vulnerability, React hydration mismatch), and lessons learned about focusing on real problems over engineering challenges.
63
5
9
Article
Weaviate·24w
Context Engineering for AI Agents
Context engineering is the discipline of designing systems that feed LLMs the right information at the right time, addressing the fundamental constraint of finite context windows. It encompasses six interdependent pillars: agents that orchestrate decisions, query augmentation that refines user input, retrieval that connects to external knowledge, prompting that guides reasoning, memory that preserves history, and tools that enable real-world action. Unlike prompt engineering which focuses on phrasing instructions, context engineering builds the architecture around the model, treating the context window as a scarce resource and designing retrieval, memory systems, and tool integrations to maximize signal while avoiding context poisoning, distraction, confusion, and clash.
56
10
Video
Fireship·23w
OpenAI is edging us all... Closer to AGI
OpenAI released GPT 5.2, reclaiming leadership in AI benchmarks after Google's Gemini 3 dominance. The model shows a 390x efficiency improvement over its predecessor and tops the ARC AGI benchmark, which tests reasoning and generalization rather than memorization. The release includes improved coding capabilities and fewer hallucinations, though practical differences may be subtle for average users. OpenAI also secured a $1 billion deal with Disney for AI-generated content featuring iconic characters.
52
8
11
Article
Hacker News·25w
LLMs are a failure. A new AI winter is coming.
Large Language Models (LLMs) face fundamental limitations that make them unsuitable for most practical applications. The core issue is that transformers generate plausible-sounding output by predicting the next token, which inevitably leads to hallucinations when the model lacks relevant training data. This results in a 5-40% failure rate that cannot be eliminated through scaling or fine-tuning. The author predicts an imminent AI bubble burst, with corporate AI projects failing at a 95% rate, similar to the dot-com crash. While some use cases will survive, the technology's inability to reliably distinguish correct from incorrect output makes it dangerous for critical applications like medicine, education, and law enforcement.
53
29
12
Article
Laravel News·21w
Laravel TOON: Reduce LLM Token Usage by 40-60%
Laravel TOON is a package that reduces LLM token usage by 40-60% through intelligent data serialization. It flattens nested JSON objects into a compact YAML-like format using dot notation, eliminating repeated keys, braces, and quotes. Real-world benchmarks show savings of ~2,200 tokens per 50-record response. The package includes configuration options for omitting null values, truncating strings, aliasing keys, and controlling nesting depth. It's particularly useful for MCP servers and LLM context optimization, with full round-trip fidelity for decoding back to original structures.
44
3
13
Article
Keshav Ashiya·23w
Docify: Building a Production RAG System for Knowledge Management
Docify is an open-source RAG system that processes documents locally while maintaining AI capabilities. The architecture uses 11 specialized services including async embedding generation with Celery, hybrid search combining pgvector and BM25, multi-factor ranking with citation verification, and token-aware context assembly. Built with PostgreSQL pgvector for vector storage, Redis for task queuing, and Ollama for local LLM inference, it supports heterogeneous document formats and implements deduplication via SHA-256 hashing. The system uses HNSW indexing for sub-200ms vector search, reciprocal rank fusion for search result merging, and citation verification to reduce hallucinations.
43
14
Video
Web Dev Cody·21w
"I've never felt this much behind as a programmer." - Andrej Karpathy
Andrej Karpathy, OpenAI founding member and GPT contributor, expressed feeling behind as a programmer due to rapid AI tooling evolution. He describes the profession being "dramatically refactored" with new layers of abstraction including agents, prompts, MCP, and IDE integrations. Karpathy emphasizes experienced developers have an advantage if they adapt quickly rather than reject these tools. The discussion covers practical AI coding tools like Claude Opus 4.5, Cursor, and code generation capabilities that can complete features in minutes versus hours of manual work. Developers who embrace AI workflows gain significant productivity advantages over those still coding traditionally.
39
5
15
Article
Daily Dose of Data Science | Avi Chawla | Substack·22w
[Hands-on] Deploy and Run LLMs on your Phone!
Fine-tune and deploy LLMs directly on iOS and Android devices using UnslothAI, TorchAO, and ExecuTorch. The tutorial walks through loading Qwen3-0.6B, preparing reasoning and chat datasets, training with quantization-aware methods, exporting to mobile-ready .pte format, and running the model locally on iPhone at ~25 tokens/second. The resulting model is ~470MB and runs 100% on-device without requiring cloud connectivity.
37
16
Article
mlflow·21w
AI Observability for Every TypeScript LLM Stack
MLflow 3.6 introduces automatic tracing integrations for TypeScript and JavaScript LLM frameworks including Vercel AI SDK, LangChain.js, LangGraph.js, Mastra, Anthropic, and Gemini. These integrations use OpenTelemetry to send traces to MLflow's tracking server, capturing prompt/response payloads, token usage, tool results, and errors. Setup requires minimal configuration—typically just pointing an OTLP endpoint to your MLflow server and wrapping SDK clients. MLflow can be deployed via Docker Compose or managed cloud services, eliminating the need for a Python environment alongside JavaScript stacks.
35
17
Article
Read the Tea Leaves·21w
An experiment in vibe coding
A developer built a travel itinerary web app using AI coding assistants (Bolt.new and Claude Code) with minimal manual coding. The experiment succeeded in creating a functional PWA with PocketBase backend for $21/month, but revealed limitations: AI tools aren't ready for non-programmers, generated code had accessibility issues and React performance problems requiring manual optimization, and token limits were frustrating. The experience highlights how AI-generated code works well for personal projects where requirements are simple and one user needs satisfaction, but remains unsuitable for professional codebases requiring team understanding and customer reliability.
33
1
18
Article
Sebastian Raschka·24w
From Random Forests to RLVR: A Short History of ML/AI Hello Worlds
A chronological overview traces the evolution of beginner-friendly ML/AI examples from 2013 to 2025. Starting with Random Forests on Iris datasets and XGBoost on Kaggle competitions, it progresses through neural networks (MLPs, AlexNet), transformer models (DistilBERT, Llama 2 with LoRA), and culminates with reasoning models using RLVR on mathematical datasets. Each milestone reflects when methods became mainstream and accessible, often lagging years behind their initial publication due to tooling maturity and community adoption.
32
19
Article
Tech Lead Digest·24w
Technical Deflation
AI is making software development progressively easier and cheaper, creating a "technical deflation" effect where startups may delay building features expecting future tools will make development even simpler. This phenomenon gives late-movers an advantage, as companies starting 6-12 months later can build the same functionality with less effort and complexity. The trend suggests startups should focus more on distribution, customer understanding, and sales rather than pure technical execution, since the building itself is becoming commoditized. The rapid pace of AI improvement means timing and strategic patience may matter more than being first to market.
31
2
20
Article
Hugging Face·25w
Transformers v5: Simple model definitions powering the AI ecosystem
Hugging Face releases Transformers v5, marking five years since v4 with daily installs growing from 20,000 to 3 million. The library now supports over 400 model architectures and 750,000 community checkpoints. Version 5 focuses on simplicity through modular design, improved training support for both pre-training and fine-tuning, enhanced inference capabilities with continuous batching and a new serving API, and first-class quantization support. The release emphasizes interoperability across the ecosystem, enabling seamless integration with inference engines like vLLM and SGLang, local deployment tools like llama.cpp and MLX, and training frameworks like Unsloth and Axolotl.
31
21
Article
Salesforce Engineering·25w
How Agentforce Achieved 3–5x Faster Response Times
Salesforce's Forward Deployed Engineering team optimized Agentforce for a multi-brand retailer by separating deterministic logic from LLM reasoning, moving hierarchical processing from prompts to Apex code. They consolidated multi-stage LLM calls into single passes and optimized Data 360 retrieval, reducing end-to-end latency by 75% (approximately 20 seconds). The team chose a multi-agent architecture over a unified model, enabling brand-specific conversational experiences while maintaining a shared foundation that accelerated subsequent brand deployments by 5x.
30
22
Article
Programming Digest·24w
AI Skeptic to AI Pragmatist
A developer shares their journey from AI skepticism to pragmatic adoption after months of hands-on experience with LLMs and AI coding assistants. Key learnings include the importance of providing structured context (who, what, why, how), treating AI as a collaborator rather than magic, using git commits as save points, switching between models for different tasks, and creating MCP servers for framework-specific knowledge. The author emphasizes that AI requires a learning curve and intentional practice to use effectively, advocating for industry discussions that acknowledge AI's utility while addressing ethical concerns.
28
23
Article
Prismic·23w
Agent Lab: Building a Hybrid Team of Humans and AI Agents
A marketing team experimented with building AI agents to create a hybrid human-AI workforce over one quarter. They used n8n as their primary automation platform, structured the project with dedicated channels and weekly check-ins, and built various agents including an executive briefing agent, content workflow system, and LLM benchmarking tool. Key learnings included: not everything needs full autonomy (simple workflows often work better), tool integration is harder than expected, scoping down is critical for success, and understanding processes deeply before automating is essential. Some agents became valuable company-wide tools while others served mainly as learning exercises.
27
24
Article
vLLM·23w
Token-Level Truth: Real-Time Hallucination Detection for Production LLMs
HaluGate is a real-time hallucination detection system for production LLMs that identifies when models generate claims contradicting provided context. It uses a two-stage pipeline: first classifying whether queries need fact-checking (96.4% accuracy, 12ms latency), then performing token-level detection with NLI explanation for factual queries (76-162ms overhead). Built with ModernBERT and native Rust/Candle integration, it runs without Python dependencies, adding negligible latency compared to LLM generation times. The system integrates with vLLM's Signal-Decision Architecture, exposing results via HTTP headers for downstream policy enforcement. Unlike LLM-as-judge approaches, HaluGate provides explainable, consistent verification specifically for extrinsic hallucinations where tool/RAG context exists.
27
1
25
Video
Siliconversations·21w
Why Does The Seahorse Emoji Drive ChatGPT Insane?
ChatGPT enters an infinite loop when asked about the seahorse emoji because it predicts one should exist but cannot produce it. As a next-word predictor, the model gets stuck repeatedly trying to correct itself. The issue likely stems from Reddit posts in its training data where people falsely remember a seahorse emoji existing (Mandela effect), creating a contradiction between what the model expects to exist and what it can actually output.
26
3

See all LLM archives