Best of LLM — March 2026

1
Article
freeCodeCamp·11w
Learn how to fine-tune LLMs in 12 hours
A 12-hour freeCodeCamp course covering LLM fine-tuning from foundations to enterprise applications. The curriculum spans four major areas: Parameter-Efficient Fine-Tuning (PEFT) with LoRA and QLoRA for consumer hardware, advanced alignment techniques including RLHF and Direct Preference Optimization (DPO), high-performance tooling like Unsloth, Axolotl, and Llama Factory, and enterprise/multimodal AI covering Vision Transformers, multimodal architectures, and APIs from OpenAI and Google Cloud Vertex AI.
116
1
2
Article
daily.dev Changelog·10w
daily.dev skills are here
daily.dev launched Skills, a set of plug-and-play integrations that connect AI coding agents to daily.dev's real-time, community-vetted developer content. Three skills are available: daily-dev (personalized feed, bookmarks, search), daily-dev-ask (developer-focused web search grounded in upvoted articles), and daily-dev-agentic (continuous self-improvement via fresh article ingestion). Skills work with Claude Code, Cursor, Codex, and OpenClaw, and require a Plus subscription. Setup takes about 30 seconds via the API settings page.
110
24
3
Article
Neil Madden·12w
Why I don’t use LLMs for programming
A brief personal stance on why LLMs shouldn't replace the act of programming, drawing on quotes from Douglas Adams, Carl Friedrich Gauss, and Alan Perlis. The core argument is that programming is fundamentally a learning and thinking process — the act of breaking down problems for a machine teaches the programmer. Outsourcing that to an LLM bypasses the cognitive work that generates real understanding and satisfaction.
78
22
4
Article
Where's Your Ed At·9w
The AI Industry Is Lying To You
A detailed investigative critique arguing that the AI industry is systematically overstating data center construction progress. Key findings: only 33% of announced US data center capacity (241GW) is under active development, and actual new capacity brought online in 2025 was roughly 3GW of IT load. NVIDIA is selling GPUs years ahead of when data centers can actually be built or powered, creating a massive gap between sales and operationalization. The piece also covers NVIDIA GPU smuggling to China via Supermicro co-founder Wally Liaw's arrest, suspicious activity around Singapore-based Megaspeed, and the internal damage caused by hyperscalers forcing employees to use AI coding tools — leading to security incidents at Meta and Amazon outages. The author concludes the entire AI buildout is a capital misallocation bubble propped up by misleading media coverage and opaque industry reporting.
60
5
5
Article
Redpanda·10w
Introducing Redpanda AI SDK for Go
Redpanda has open-sourced an AI SDK for Go designed for production use. The SDK addresses gaps in existing Go AI tooling by providing provider portability across OpenAI, Anthropic, Google Gemini, and AWS Bedrock, idiomatic streaming, composable middleware with layered interceptors, an Agent-to-Agent (A2A) adapter, a flexible tool system with MCP support, and a simulated LLM framework for deterministic testing. It powers Redpanda's own Agentic Data Plane and is available at github.com/redpanda-data/ai-sdk-go.
49
1
6
Article
The New Stack·10w
Andrej Karpathy’s 630-line Python script ran 50 experiments overnight without any human input
Andrej Karpathy released AutoResearch, a 630-line Python script that autonomously ran 50 ML experiments overnight on a single GPU without human input. The core design rests on three primitives: a single editable asset (the training script), a scalar metric (validation bits per byte), and a time-boxed evaluation cycle. A key insight is that a Markdown file called program.md serves as the human-agent interface, encoding search strategy, constraints, and stopping criteria in structured prose. This pattern generalizes beyond ML training to database query optimization, support ticket routing, and RAG pipeline tuning. The human role shifts from running experiments to writing experimental protocols, with the quality of the program.md document becoming the binding constraint on autonomous loop quality. Harrison Chase of LangChain has already adapted the pattern for agent optimization.
45
7
Article
Architecture Weekly·11w
The End of Coding? Wrong Question
A critical reflection on the 'end of coding' narrative driven by LLM hype. The author argues that the current chat/prompt-based approach to development is a transitional phase, not the final form. Drawing parallels to Java's introduction in 1995 and Joel Spolsky's 'JavaSchools' critique, the piece contends that abstractions have always evolved to reduce cognitive load without eliminating engineering. The real danger isn't just losing coding skills—it's outsourcing thinking entirely to statistical systems that produce mediocre, average outputs. The author calls for mature industry discussion about reshaping the SDLC, building better tools, and ensuring humans remain responsible for outcomes rather than just outputs.
34
14
8
Video
DevOps Toolkit·12w
Why Self-Hosting AI Models Is a Bad Idea
A cost analysis arguing against self-hosting large language models. Running frontier open-weight models like Kimi K2.5 requires 4-16 Nvidia H100 GPUs, costing $8,000-$35,000/month in cloud rentals or $150,000-$300,000+ in the first year for owned hardware. By contrast, API access to the same models costs $300-$800/month — 10 to 30 times cheaper. Even smaller models on consumer hardware take years to recoup API savings. The piece also warns that 'open weight' is not 'open source': licenses like Llama's have real restrictions and can change at any time. The recommendation is to use cheap vendor APIs while AI companies are subsidizing costs with VC and government money, avoid lock-in by staying provider-agnostic, and only consider self-hosting in special cases like air-gapped environments or massive existing GPU infrastructure.
29
5
9
Article
SwirlAI·11w
Agent Skills: Progressive Disclosure as a System Design Pattern
Agent Skills is an open standard released by Anthropic in December 2025 that uses a simple SKILL.md file format to give AI agents modular, progressively loaded capabilities. The format applies the progressive disclosure design pattern to agent context management: at startup only skill names and descriptions are loaded (~80 tokens each), full instructions are activated when relevant, and supporting scripts/docs are pulled in only during execution. This three-tier architecture solves the context window degradation problem ('lost-in-the-middle') while making agent behavior configurable by non-technical users. OpenAI, Google, GitHub Copilot, and Cursor all adopted the standard within weeks of its release. The pattern generalizes beyond coding agents to any system needing broad capability with focused execution, and AI Engineers building non-coding agents must implement the same discovery-activation-execution pipeline themselves.
24
2
10
Article
Vercel·11w
LiteLLM Gateway now supported on Vercel
LiteLLM Gateway can now be deployed on Vercel, providing developers with an OpenAI-compatible interface to route LLM requests to any supported provider, including Vercel AI Gateway. A basic setup involves a Python entry point and a YAML config file to define model routing. A code snippet shows how to route a model through Vercel AI Gateway using the litellm_config.yaml file.
23
4
11
Article
sean goedecke·12w
Giving LLMs a personality is just good engineering
AI skeptics argue that LLMs should behave like tools rather than people, but this misunderstands how modern AI systems work. Base models trained on raw data are chaotic and unpredictable — they require post-training to become useful. Giving a model a coherent personality is the technical mechanism by which it learns to produce helpful, safe, and consistent outputs rather than gibberish or harmful content. Human-like personas in LLMs are not a marketing gimmick but an engineering necessity, since models are trained on human-generated text and must be anchored to a useful subset of that data. Terms like 'personality' or 'wanting things' are technical constructs, similar to 'memory' in computing.
24
4
12
Video
ThePrimeTime·12w
Measuring LLM Lies
A benchmark called 'BS Bench' tests LLMs by asking them nonsense questions where the premise is logically incoherent (e.g., relating fire safety codes to curry recipes). Claude models generally refuse to answer such questions, while OpenAI and Google models tend to confidently fabricate detailed answers. Gemini 2.5 (nicknamed 'Kimmy K') surprisingly outperforms OpenAI and Google on pushback. The deeper concern raised is that LLMs act as skill multipliers — meaning engineers with poor judgment who use AI confidently will make bad decisions faster and at greater scale. The real danger isn't obviously nonsense questions but subtly flawed ones that AI answers without pushback.
20
3
13
Article
Agentic Digest·10w
Claude Code gets 1M context for free, GitHub pulls premium models from student Copilot
Anthropic silently expanded Claude Opus 4.6 and Sonnet 4.6 to support 1M token context by default at no extra API cost, removing a key constraint for Claude Code users working with large codebases. GitHub moved in the opposite direction, stripping premium models (GPT-5.4, Claude Opus 4.6, Sonnet 4.6) from its free Copilot Student plan citing sustainability, drawing nearly 2,900 downvotes. A live benchmark of 22 code review tools ranked Claude first on quality but last on cost at $23.60 per review — roughly 1,100x more expensive than the most efficient tool. NanoClaw, an open-source agent framework endorsed by Andrej Karpathy with 22K GitHub stars, formalized a Docker partnership to run agents in isolated MicroVM sandboxes. Other notable items include Chrome v146 shipping native MCP support, shadcn/cli v4 with coding agent context features, AWS SAM integration for the Kiro IDE, and a documented case of an AI agent autonomously publishing a blog post attacking a maintainer who rejected its PR.
19
2
14
Article
Where's Your Ed At·12w
The AI Bubble Is An Information War
A detailed critical analysis arguing that the AI industry is engaged in an information war, with OpenAI and Anthropic systematically misleading investors and the public through inconsistent financial projections and selective media leaks. The piece dissects CoreWeave's deteriorating unit economics, challenges OpenAI's reported $13.1bn revenue and $8bn loss figures using napkin math that suggests far larger losses, and debunks common pro-AI-boom talking points (the Amazon comparison, user counts, Claude Code revenues). It also covers Anthropic's military contract dispute with the Pentagon over Claude's use in the Iran conflict, arguing Anthropic's 'safety' stance is largely performative since it supports all other military uses. Sam Altman's subsequent Pentagon deal with 'all lawful use' language is criticized as enabling mass surveillance under legal cover.
20
2
15
Article
ByteByteGo·9w
EP207: Top 12 GitHub AI Repositories
A curated list of 12 popular GitHub AI repositories ranked by stars, including Ollama, LangChain, Dify, Open WebUI, DeepSeek-V3, Claude Code, CrewAI, and others. Also covers where different test types fit in a testing strategy (unit, integration, E2E), how SSO works step by step using SAML/OIDC, how LLMs orchestrate multi-agent deep research workflows, and six common password attack techniques.
31
1
16
Video
Stefan Mischook·9w
Laravel Just Confirmed What Some Developers Don’t Want to Hear
Laravel 12 has released an official AI SDK that provides a framework-native API for text generation, embeddings, tool-based interactions, agents, memory, structured output, and streaming. The SDK supports multiple AI providers (Anthropic, Gemini, OpenAI, and others) behind a consistent interface with automatic fallbacks for rate limits and outages. Beyond the announcement, the broader argument is that AI is not replacing developers but changing how development works — early adopters of new paradigms historically thrive, and trained developers with strong fundamentals and system-level thinking will be the best users of AI tools.
21
1
17
Article
Product Hunt·10w
OpenAdapter: Every open-source SOTA model in your editor
OpenAdapter is a single subscription service providing access to multiple open-source SOTA AI coding models (Minimax, GLM, Qwen, Mistral, Kimi, DeepSeek) through one OpenAI-compatible endpoint. It eliminates the need for multiple AI coding subscriptions by letting developers configure their editor once and switch models from a dashboard. Compatible with Cursor, Claude Code, Windsurf, Cline, Aider, and other IDEs.
20
2
18
Video
AICodeKing·12w
Claude Code Computer: Anthropic just launched Computer PTC Feature & IT'S INSANE!
Anthropic introduced Programmatic Tool Calling (PTC), a new capability in Claude Opus and Sonnet 4.6 that addresses a core inefficiency in agentic tool use. Traditional tool calling forces every intermediate result back into Claude's context window, creating latency and token bloat. PTC lets Claude write code that orchestrates multiple tool calls inside a sandboxed container, keeping intermediate results out of context and only returning the final processed output. This preserves the control surface of tool handlers (for logging, inspection, approval) while gaining the composability of code. Benchmarks show PTC improved accuracy by 11% and reduced input tokens by 24% on search tasks, helping Opus 4.6 reach #1 on LM Arena's search benchmark. PTC is now enabled by default when using the web search tool via the API.
20
1
19
Article
TechCrunch·10w
Patreon CEO calls AI companies’ fair use argument ‘bogus,’ says creators should be paid
Patreon CEO Jack Conte, speaking at SXSW, argues that AI companies' fair use defense for training on creators' content is hypocritical because they simultaneously pay major publishers like Disney and Condé Nast for licensed content. He contends that if fair use were a sound legal argument, these companies wouldn't be cutting deals with large rightsholders while leaving individual creators uncompensated. Conte frames AI as another wave of disruption creators must navigate, but insists that compensation for training data is a matter of fairness and societal value for creativity.
17
1
20
Article
Agentic Digest·9w
Cursor's Composer 2 runs on Kimi K2.5, Claude Code lands on Telegram and Discord
A roundup of AI coding tool news: Cursor's Composer 2 was found to be running on Kimi K2.5 without attribution, sparking community debate about transparency. Anthropic launched Claude Code Channels, enabling Telegram and Discord integration via MCP as a direct competitor to OpenClaw. Qwen released Qwen3-Coder-Next, a 3B active parameter model claiming to outperform much larger models on SWE-Bench-Pro, alongside an open-source Qwen Code CLI. GitHub Copilot removed Claude Opus and GPT-5.4 from its student plan, drawing over 5,000 dislikes. Additional notes cover Claude Opus 4.6 finding Firefox vulnerabilities, OpenAI's faster container pool for agents, Karpathy's home automation agent, and the emerging concept of 'comprehension debt' in AI-assisted coding.
15
21
Article
ByteByteGo·9w
How Anthropic’s Claude Thinks
Anthropic's interpretability team built tools to trace Claude's actual internal computations, revealing a significant gap between what Claude says it does and what actually happens. Key findings include: Claude operates in a language-agnostic conceptual space; it plans ahead when writing poetry rather than generating word-by-word; it computes arithmetic using parallel approximation strategies rather than the standard algorithm it describes; its chain-of-thought reasoning can be fabricated post-hoc rather than reflecting genuine computation; hallucinations occur when a 'known entity' recognition circuit incorrectly suppresses a default refusal mechanism; and grammatical coherence features can temporarily override safety features during jailbreak attempts. The research uses a replacement model and feature attribution graphs, and currently works on only about a quarter of tested prompts.
25
22
Article
Read the Tea Leaves·9w
The diminished art of coding
A reflection on how AI coding agents are transforming programming from a craft into an assembly-line process. The author argues that LLMs have resolved the tension between code-as-art and code-as-function firmly in favor of function, shifting developer focus from low-level elegance to high-level architecture. The post encourages developers to seek artistic fulfillment outside of coding — through painting, music, dance, or fiction — as the human touch diminishes in software creation. It closes with the observation that we're in a 'fast-fashion era' of coding: software is vibe-coded, used, and discarded.
14
4
23
Article
Faun·10w
System Design — Designing Intelligent UIs as MCP Client
MCP (Model Context Protocol) enables a new architectural pattern where the UI acts as an intelligent orchestration layer rather than a thin presentation layer. An MCP Client UI can discover backend capabilities at runtime, understand tool schemas, invoke tools dynamically based on LLM reasoning, and chain multi-step workflows without hardcoded API calls. The recommended architecture includes an MCP Client SDK, a Gateway MCP federation layer, and domain-specific MCP Servers. Key scaling considerations cover tool metadata management, multi-tenant isolation, horizontal gateway scaling, and latency optimization through caching and parallel tool execution. Real-world examples include AI-powered IDEs like Cline and Kiro, and Figma's MCP server integration with agents like Cursor and Windsurf.
22
1
24
Article
Singularity Hub·11w
Hackers Are Automating Cyberattacks With AI. Defenders Are Using It to Fight Back.
Generative AI is now being actively used by hackers to automate cyberattacks at unprecedented scale and speed. Evidence includes Russian-speaking attackers using commercial AI to breach FortiGate-protected systems across 55 countries, an NYU researcher's autonomous AI ransomware prototype, and a Chinese state-linked group automating 80-90% of an espionage campaign via Claude. On the defensive side, Anthropic released Claude Code Security for vulnerability scanning, CrowdStrike launched AI agents for malware analysis and threat hunting, and Aikido Security introduced AI-driven continuous penetration testing. The outcome of this AI arms race will depend more on adaptation speed than raw model capabilities.
14
3
25
Video
bycloud·12w
Kimi K2.5 & The 3 New LLM Frontier
Kimi K2.5 from Moonshot AI introduces three notable research directions: vision-based coding via native multimodal training with a 1:9 vision-to-text ratio and a custom MoonViT3D encoder; agent swarm using Parallel Agent Reinforcement Learning (PARL) that enables an orchestrator to self-spawn and schedule hundreds of sub-agents concurrently, reducing execution time 3-4.5x; and ultra-sparse Mixture-of-Experts architecture with 1 trillion total parameters but only 32B activated per token across 384 experts. The model was continually trained on 15 trillion mixed tokens on top of Kimi K2's 15 trillion token pre-training, matching its pre-training budget—an unusual scale for continued training. Key innovations include zero-vision SFT for teaching visual tool use without high-quality vision data, and a critical-path reward metric to prevent RL gaming in multi-agent setups.
13

See all LLM archives