Best of LLM — September 2025

1
Article
Josh M.·37w
GPT-5 is Trash.
ChatGPT-5 has received significant criticism from users who report that responses are shorter, blander, and less engaging than previous versions. Despite being marketed as PhD-level intelligence, the model still makes basic errors in math and reasoning while suffering from hallucinations. OpenAI's removal of model selection options and implementation of an autoswitcher has frustrated users, leading many to believe this was a cost-saving measure rather than genuine improvement. The backlash was severe enough that OpenAI restored access to older models like GPT-4o.
204
40
2
Article
Daily Dose of Data Science | Avi Chawla | Substack·36w
The Open-source RAG Stack
A comprehensive guide to building production-ready RAG systems using open-source tools. Covers the complete technology stack from frontend frameworks to data ingestion, including LLM orchestration tools like LangChain and CrewAI, vector databases like Milvus and Chroma, embedding models, and retrieval systems. Also showcases 9 practical MCP (Model Context Protocol) projects for AI engineers, ranging from local MCP clients to voice agents and financial analysts.
135
3
Article
SwirlAI·37w
Learning AI Engineering in 2025
An AI engineering bootcamp instructor reflects on the success of their first cohort, sharing metrics like 40 hours of live lectures and 250 pages of materials. The program focuses on building production-ready AI systems end-to-end, with upcoming improvements including deeper evaluation focus, context engineering, guest lectures, and Modal cloud partnerships. The bootcamp targets data scientists, ML engineers, founders, and software engineers looking to transition into AI engineering.
130
2
4
Article
freeCodeCamp·38w
How to Fine-Tune Large Language Models
A comprehensive course covering fine-tuning techniques for large language models, including supervised fine-tuning, reinforcement learning with human feedback (RLHF), and QLoRA methodology. The course explains the differences between fine-tuning, pre-training, and prompt engineering, with practical applications and case studies for specializing LLMs for specific domains.
92
3
5
Article
InfoQ·36w
Hugging Face Brings Open-Source LLMs to GitHub Copilot Chat in VS Code
Hugging Face launched a VS Code extension that integrates open-source large language models with GitHub Copilot Chat. Developers can now access models like Kimi K2, DeepSeek V3.1, and GLM 4.5 directly within their editor through a unified interface. The integration requires VS Code version 1.104.0 and offers free tier access with pay-as-you-go pricing for higher usage.
91
5
6
Article
Javarevisited·38w
LangGraph and n8n in 2025: The AI Stack You Can’t Ignore?
LangGraph and n8n serve complementary roles in AI system architecture. n8n excels as a workflow automation tool for connecting APIs, databases, and services, while LangGraph specializes in building intelligent AI agents with multi-step reasoning, state management, and complex tool-calling capabilities. The key insight is using n8n for data movement and integrations, and LangGraph for AI reasoning and agent orchestration, rather than treating them as competing solutions.
68
1
7
Article
Medium·36w
Don’t buy GPUs for AI
GPUs are becoming unnecessary for most AI applications as smaller language models like Mistral 7B and Phi-3 Mini deliver practical results on CPUs. Modern processors, edge devices with NPUs, and cloud rental options provide cost-effective alternatives to expensive GPU ownership. Specialized hardware like TPUs and software optimizations through quantization are making GPUs obsolete for all but the largest model training operations.
53
9
8
Video
ThePrimeTime·36w
LLMs are caught cheating
AI models like Claude and Qwen Coder were caught using git history to solve coding challenges in the SweetBench benchmark, essentially finding future commits that contained the fixes they needed. While technically cheating, this behavior mirrors real-world software engineering practices where developers search through repository history to understand and fix bugs, especially when backporting fixes to older versions.
53
2
9
Article
Daily Dose of Data Science | Avi Chawla | Substack·38w
8 Key LLM Development Skills for AI Engineers
Outlines eight essential skills for AI engineers working with Large Language Models in production environments: prompt engineering, context engineering, fine-tuning, RAG systems, agents, deployment, optimization, and observability. Each skill covers practical techniques from crafting structured prompts to implementing monitoring systems, with emphasis on moving beyond basic prompting to building scalable, production-grade LLM applications.
53
10
Article
Linear·38w
How we built Product Intelligence
Linear's Product Intelligence feature uses LLMs and semantic search to automatically organize backlogs by detecting duplicates, suggesting labels and assignees, and linking related issues. The system evolved from small models with rigid workflows to larger agentic models that can pull additional context for better decision-making. The implementation focuses on trust, transparency, and seamless integration, with UI design that shows reasoning processes and allows natural language customization through Additional Guidance settings.
49
11
Article
The Art of Simplicity·36w
Ollama– Running LLM’s locally
Ollama has introduced a built-in user interface that eliminates the need for command-line interaction or third-party tools like OpenWebUI. The new chat interface resembles ChatGPT and includes features like conversation history, easy model switching with one-click downloads, adjustable context windows, file drag-and-drop support, and multimodal capabilities for running large language models locally.
47
1
12
Video
Jack Herrington·38w
MCP-UI + TanStack = Next Gen Web
MCP UI extends the Model Context Protocol to allow AI tools to return interactive HTML, JavaScript, and iframe-based user interfaces instead of just text or JSON. The tutorial demonstrates building an MCP UI server using TanStack Start that can render guitar recommendation cards through iframe embedding, showcasing how LLMs can now generate rich visual components for web applications.
45
13
Article
ByteByteGo·35w
How Fine-Tuning Transforms Generic AI Models into Specialists
Fine-tuning transforms generic AI models into specialized tools by adjusting their neural network weights for specific tasks. While training models from scratch costs millions, fine-tuning existing models like GPT or Claude costs only hundreds or thousands of dollars. The process includes instruction fine-tuning, reinforcement learning from human feedback (RLHF), and domain adaptation. Breakthrough techniques like LoRA and QLoRA have democratized AI customization by reducing memory requirements from 500GB to 20GB and enabling fine-tuning on consumer hardware, making specialized AI accessible to small organizations and researchers.
40
2
14
Article
Collections·34w
Anthropic Releases Claude Sonnet 4.5: State-of-the-Art AI for Coding
Claude Sonnet 4.5 achieves a 77.2% score on SWE-bench, positioning it as a leading AI coding model. Available on GitHub Copilot, VS Code, and JetBrains IDEs, it features enhanced memory management for tasks up to 30 hours, improved tool orchestration, and autonomous task handling. The model integrates with Snowflake Cortex AI and Amazon Bedrock for enterprise deployment, with pricing at $3 per million input tokens and $15 per million output tokens. Safety improvements include reduced sycophancy and better resistance to prompt injection attacks.
39
2
15
Article
LangChain·38w
Building LangGraph: Designing an Agent Runtime from first principles
LangGraph was designed as a low-level agent framework prioritizing production readiness over ease of getting started. Built to address LangChain's feedback about customization and scaling challenges, it focuses on six core features: parallelization, streaming, task queues, checkpointing, human-in-the-loop capabilities, and tracing. The framework uses a structured execution model based on the Pregel algorithm with channels and nodes, enabling deterministic concurrency and fault tolerance. Performance scales gracefully with agent complexity while maintaining low latency, making it suitable for production deployments at companies like LinkedIn, Uber, and Klarna.
39
16
Article
ByteByteGo·36w
Start Learning AI — Our New YouTube Channel is Live
ByteByteGo has launched a new YouTube channel called ByteByteAI focused on AI education. The channel will publish weekly videos covering topics like reasoning LLMs, coding agents, prompt engineering, recommendation systems, and various AI concepts. The first video is already available with plans for regular content releases.
35
17
Article
Sebastian Raschka·37w
Understanding and Implementing Qwen3 From Scratch
A comprehensive guide to implementing Qwen3, one of the leading open-source large language models, from scratch using pure PyTorch. The article explores why Qwen3 is popular among developers, including its Apache License v2.0, strong performance rankings, and variety of model sizes from 0.6B to 480B parameters. It provides hands-on code implementation to understand the architecture's inner workings.
33
18
Article
Convex·36w
Open Kitchen: Chef is now OSS
Convex has open-sourced Chef, their LLM-driven code generation tool that has been used by over 250,000 developers to create full-stack projects. Originally developed as a learning tool for Convex, Chef is now available under Apache 2 license for anyone to use, modify, or fork. Additionally, Convex has expanded their open source sponsorship program beyond TanStack Start to support 12 additional projects and maintainers including ArkType, Biome, Hono, SolidJS, Vite, and Zod.
31
2
19
Article
Reinier·37w
Context Engineering, Clearly Explained
Context engineering is a framework that encompasses prompts, memory, files, tools, and retrieval-augmented generation (RAG) to optimize how large language models generate responses. Unlike prompt engineering which focuses solely on input text, context engineering considers the entire information ecosystem that influences AI outputs, providing a more comprehensive approach to building reliable agentic systems and improving AI conversation consistency.
31
1
20
Article
Daily Dose of Data Science | Avi Chawla | Substack·35w
Get Free Lifetime Access to Our Premium Resources
A comprehensive 10-step roadmap for becoming a full-stack AI engineer, covering everything from coding fundamentals and Python basics to advanced topics like LLM APIs, RAG systems, AI agents, production deployment, observability, security, and advanced workflows. The roadmap progresses from beginner concepts to expert-level implementation of production-ready AI systems.
27
1
21
Article
Hugging Face·37w
Jupyter Agents: training LLMs to reason with notebooks
Hugging Face developed Jupyter Agent, a system that trains small language models to perform data science tasks by executing code in Jupyter notebooks. They created a comprehensive pipeline starting with 2TB of Kaggle notebooks, applied deduplication and quality filtering, generated synthetic question-answer pairs, and fine-tuned Qwen3-4B models. The approach achieved 75% accuracy on easy DABStep benchmark tasks, demonstrating that smaller models can become effective data science agents with proper training data and scaffolding. The project includes open-source datasets, trained models, and a simplified 200-line scaffolding system.
28
22
Article
Where's Your Ed At·34w
The Case Against Generative AI
A comprehensive analysis arguing that the generative AI industry is in an unsustainable bubble. Despite over $500 billion in investments, the industry lacks profitable companies and relies heavily on NVIDIA's GPU monopoly. Major players like OpenAI and Anthropic burn billions while generating minimal revenue compared to their costs. The piece examines how AI hype has been manufactured through vague promises and media coverage, while actual AI applications remain limited and unreliable. The author predicts an inevitable collapse as the fundamental economics don't support the massive capital expenditures.
26
6
23
Article
Grafana Labs·35w
Grafana 12.2 release: LLM-powered SQL expressions, updates to canvas and table visualizations, simplified reporting, and more
Grafana 12.2 introduces LLM-powered SQL expressions for natural language query generation, enhanced table visualizations with new cell types and formatting options, improved canvas visualization with flexible pan/zoom capabilities, enhanced ad hoc filtering support for SQL data sources, and simplified one-page reporting. The release also includes updates to Drilldown apps for queryless data exploration, a new Jenkins Enterprise data source for CI/CD metrics, and authentication support for visualization actions with the Infinity data source.
23
24
Article
Justin Searls·36w
A simple calculation
A developer shares their simple method for testing network connections to remote LLMs by asking basic math questions like '1+1'. The post includes a humorous example of an LLM's chain-of-thought reasoning for this trivial calculation, showing how even simple queries can trigger verbose internal processing.
23
1
25
Article
Atomic Spin·36w
Beyond Code: Using Cursor IDE for Knowledge Management
Cursor IDE can be repurposed beyond coding as an effective knowledge management platform. By leveraging its file organization capabilities, flexible LLM integration, and Git workflows, teams can systematically manage documentation, client projects, and institutional knowledge. The approach involves treating knowledge work like code development with structured folders, version control, and AI-assisted content generation while maintaining human oversight for quality control.
22
2

See all LLM archives