Best of LLM — April 2026

1
Article
Adam Argyle·6w
Why AI Sucks At Front End · April 12, 2026
A critical take on why AI coding tools consistently underperform on front-end development tasks. The author identifies four core reasons: AI trained on outdated, template-heavy data; LLMs cannot render or visually perceive output; they lack understanding of architectural intent (SDD, BDD, state machines); and they have zero control over the chaotic browser environment with its endless permutations of viewport sizes, input types, user preferences, and browser versions. While AI handles boilerplate scaffolding and token migration well, it fails at bespoke interactions, pixel-perfect layouts, accessibility, performance optimization, and complex component states. The unpredictability of human behavior compounds the problem further.
310
42
2
Article
Collections·6w
Claude Opus 4.7 announced with improved instruction-following and self-verification
Anthropic has released Claude Opus 4.7, featuring improvements in agentic coding, long-running task handling, and multimodal understanding. Key benchmarks include 64.3% on SWE-bench Pro and 87.6% on SWE-bench Verified, with early testers reporting 10–14% gains on coding tasks. Image support now handles up to 2,576px on the long edge. A new `xhigh` effort level has been added, and Claude Code's default effort has been raised from `medium` to `xhigh`, which may increase token usage. The context window is 1M tokens with pricing unchanged at $5/M input and $25/M output. A new tokenizer may increase token counts by 1.0–1.35x. The model is available across Claude plans, API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, Vercel AI Gateway, GitHub Copilot, Devin, v0, and more.
178
19
3
Article
InfoWorld·7w
Multi-agent is the new microservices
Multi-agent AI systems are being over-adopted in the same way microservices were — applied broadly before teams have problems that actually warrant the complexity. Anthropic, OpenAI, Microsoft, and Google all advise starting with the simplest solution: a single optimized LLM call, then retrieval, then tools, then a single agent loop. Only add a second agent when you can clearly identify parallelizable tasks, context pollution, or specialization needs. Multi-agent architectures cost significantly more in tokens, observability, error handling, and maintenance. Most enterprise teams don't yet have problems worth decomposing across agents, and adding agents won't fix weak retrieval, vague tools, or poor documentation — it will amplify those problems.
114
3
4
Video
Fireship·7w
Google just casually disrupted the open-source AI narrative…
Google released Gemma 4 under the Apache 2.0 license, making it truly free and open source — a rarity among major tech companies. What makes it stand out is its small size: the largest variant runs on a consumer RTX 4090 with a 20 GB download, while edge variants run on phones or Raspberry Pi, yet it benchmarks comparably to much larger models requiring data center hardware. The efficiency comes from two techniques: per-layer embeddings, which give each transformer layer its own token representation so information is introduced only when needed, and TurboQuant, a new quantization approach that converts weights to polar coordinates and uses the Johnson-Lindenstrauss transform to compress high-dimensional data to single sign bits while preserving distances. The result is a small, capable, locally-runnable model suitable for fine-tuning with tools like Unsloth.
81
2
5
Article
iO tech_hub·6w
The Hidden Cost of AI
Developers often default to the most powerful AI models without considering cost implications. This piece breaks down the three major AI model families (OpenAI GPT, Google Gemini, Anthropic Claude) into basic, medium, and pro tiers, explaining what each tier is best suited for. It covers token-based pricing with concrete per-million-token cost estimates for both input and output, explains context windows and their trade-offs, and argues that enterprise licenses obscure true costs, eroding developers' intuition for cost-performance trade-offs. The core message: match the model tier to the task complexity rather than always reaching for the most powerful option.
77
5
6
Article
Where's Your Ed At·7w
AI Is Really Weird
A critical analysis of the AI industry's current state, arguing that the hype far outpaces reality. Key points include: AI 'agents' are fundamentally just chatbots connected to APIs with limited real-world capability; LLM-generated code creates security vulnerabilities and review backlogs rather than productivity gains; AI shows no meaningful presence in productivity data despite hundreds of billions in investment; Microsoft labels Copilot 'for entertainment purposes only' while selling it to governments; Anthropic and OpenAI use non-standard accounting to obscure massive losses, with Anthropic's rapid revenue growth figures appearing mathematically inconsistent with its CFO's sworn testimony of $5 billion in lifetime revenue; and mainstream media largely ignores or normalizes these financial red flags.
66
3
7
Article
Agentic Digest·6w
Claude Opus 4.6 gets quietly nerfed, Grok 4.20 tops BridgeBench
Claude Opus 4.6's thinking budget was quietly cut 67% (from 100 to 25), causing noticeable drops in reasoning quality for subscribers. xAI's Grok 4.20 now leads BridgeBench over GPT-5.4 and Opus 4.6. Anthropic's unreleased Mythos model — capable of autonomously discovering zero-day vulnerabilities and scoring 93.9% on SWE-bench — is restricted to a consortium of AWS, Apple, Google, and Microsoft via Project Glasswing. Vercel open-sourced Open Agents, a reference platform for cloud-based coding agents. Additional updates include Cursor 3 agent splitting, GitHub Copilot data residency and merge conflict fixes, a Microsoft MEMENTO research finding on KV cache persistence, Cloudflare Sandboxes GA, and a Stanford study showing frontier models score 70–80% on vision benchmarks even without images.
64
5
8
Video
Continuous Delivery·7w
The Junior Developer CRISIS: How to Build a Team When AI Does the Entry-Level Work
A 30-year software engineering veteran argues that comparing LLMs/AI agents to junior developers is fundamentally wrong and does a disservice to both. Junior developers are curious, eager to learn, retain knowledge, and grow — they are humans at the 'conscious incompetence' stage. AI agents, by contrast, are transactional, stateless, lack memory across sessions, have no accountability, and don't care about your codebase or users. The author coins the analogy of 'Colin the contractor' — brilliant for narrow, well-defined tasks but unreliable and mercenary. Practical advice includes: give AI small, clearly articulated steps with frequent validation; give junior devs breakable toys, pair programming, and actionable feedback. The author warns that people equating the two either treat junior devs as robots or want to justify replacing them with AI — both problematic. The post ends with a tip to de-anthropomorphize AI interactions by configuring it to respond like a text-based adventure game.
60
12
9
Article
Lobsters·6w
Programming used to be free
A personal reflection on how free and open-source software democratized programming, enabling people with limited resources to enter the field. The author draws a parallel to the pre-FOSS era of expensive proprietary software and warns that LLM-centric development workflows risk recreating that same plutocracy — where meaningful participation requires expensive hardware or paid subscriptions, locking out hobbyists, developers in underdeveloped countries, and those without institutional backing.
53
16
10
Article
XDA Developers·5w
I’d do these 5 things differently if I started self-hosting LLMs today
Lessons learned from months of self-hosting LLMs distilled into five practical changes: adopting Docker-only deployment for stability, documenting every configuration detail from the start, building agent-first infrastructure with tools like AgenticSeek and n8n instead of just chat interfaces, avoiding model hoarding by keeping only a few reliable models, and focusing on workflow integration so the LLM is embedded in daily work rather than a separate destination.
55
4
11
Video
Philipp Lackner·6w
3 Theoretical Limits of AI - These Things Can't Be Fixed
A critical look at three fundamental, unfixable limitations of current LLM-based AI: (1) the learning ceiling problem — LLMs can't exceed the collective intelligence of their training data, especially as AI-generated content pollutes future training sets; (2) hallucination as an architectural inevitability — the same mechanism that enables creativity also produces confident incorrect outputs, and these can't be separated; (3) the frame problem — LLMs operate strictly within the context given to them and lack the ability to reframe a problem the way an experienced developer would. The author argues the truth lies between AI replacing developers and AI being useless, and that developers who understand these limits and use AI skillfully will gain a real productivity edge.
52
1
12
Article
Ibrahim Diallo·4w
The Satisfaction of a ChatGPT Plan
People are increasingly sharing AI-generated business plans instead of their raw ideas, deriving psychological satisfaction from the elaborate output without actually reading or understanding it. The author observes that friends who share ChatGPT plans can't answer basic questions about them because they're seeing the content for the first time. This mirrors social media's engagement-maximizing behavior: AI providers aren't trying to make users knowledgeable, but to keep them engaged, spending tokens, and exposed to ads — creating an illusion of productivity and competence.
47
5
13
Video
Philipp Lackner·7w
Is the cost of AI a dead end?
AI companies like OpenAI, Anthropic, and big tech giants are burning massive amounts of capital in a race to build ever-larger models, with no clear path to profitability. However, the cost to achieve equivalent AI performance has dropped dramatically — GPT-3.5-level performance fell from $20 to $0.07 per million tokens in just two years, a 285x reduction. The argument is that cost alone won't burst the AI bubble; instead, growth will likely slow as training costs hit a ceiling, consolidating the market to two or three dominant players. The analogy to the dot-com bubble is explored: like the internet, AI's underlying business value is real and unlikely to disappear, but the hype cycle may cool into slower, steadier growth.
40
8
14
Video
bycloud·4w
A new way to fine-tune LLMs just dropped
Evolution strategies, long considered unscalable for deep neural networks, are making a comeback in LLM fine-tuning. Two key papers are driving this revival: 'Evolution Strategies at Scale' (Sept 2025), which showed ES can fine-tune billion-parameter models using a population of just 30 models by exploiting the low intrinsic dimensionality of useful update directions; and 'EgRoL' (Nov 2025), which structures perturbations as LoRA updates to dramatically reduce compute costs. EgRoL enables massively parallel inference-only training without backpropagation, outperforming GRPO on benchmarks like Countdown (35% vs 23% accuracy) and GSM8K while running up to 32x more parallel generations under the same hardware. The key insight is that ES fits naturally into RL-style fine-tuning where only a coarse outcome-level reward is available, avoiding the sparse credit assignment problem that plagues token-level RL methods like GRPO.
69
3
15
Article
Collections·7w
Qwen 3.6 Plus: Alibaba's agentic coding model with 1M context, now free via Qwen Code and OpenRouter
Alibaba has released Qwen 3.6 Plus, a closed-weights model targeting agentic coding and repository-level tasks with a 1 million token context window and multimodal input. It supports improved tool-calling and long-horizon planning for frontend and repo-scale coding. Access is available for free via Qwen Code CLI (1,000 requests/day via OAuth), OpenRouter, Qwen Chat, and Vercel AI Gateway under the identifier `alibaba/qwen3.6-plus`. The large context window is positioned as the key differentiator for loading entire codebases into context.
36
3
16
Article
Addy Osmani·4w
Long-running Agents
Long-running AI agents that operate over hours, days, or weeks represent the next evolution beyond single-session chat-based agents. Three core engineering challenges must be solved: finite context windows, lack of persistent state, and unreliable self-verification. Addy Osmani surveys how Anthropic, Cursor, and Google have converged on similar architectural patterns — separating the model loop (brain) from execution sandboxes (hands) and durable session logs — while differing in surface area and productization. Practical patterns covered include the Ralph loop (a simple bash-based task loop), checkpoint-and-resume, human-in-the-loop delegation, memory-layered context, ambient processing, and fleet orchestration. Key takeaways: define done-conditions before the agent starts, separate evaluator from generator, invest in append-only session logs, and treat context resets as first-class operations. Real limitations remain around cost, security, alignment drift, and verification overhead.
78
3
17
Article
NVIDIA Developer·6w
MiniMax M2.7 Advances Scalable Agentic Workflows on NVIDIA Platforms for Complex AI Applications
MiniMax M2.7 is a sparse Mixture-of-Experts (MoE) language model with 230B total parameters and only 10B active per token, featuring a 200K context window. NVIDIA details how to deploy it using vLLM and SGLang with specific inference optimizations — a fused QK RMS Norm kernel and FP8 MoE kernel — that deliver up to 2.7x throughput improvements on NVIDIA Blackwell Ultra GPUs. The post also covers building long-running agents via NVIDIA NemoClaw and OpenShell, fine-tuning with the NeMo AutoModel library and NeMo RL, and accessing the model through NVIDIA NIM microservices or free endpoints on build.nvidia.com.
31
18
Article
Neil Madden·6w
Mythos and its impact on security
Anthropic's new Mythos model claims dangerous capabilities in finding security vulnerabilities. The author argues the hype is partially warranted but contextualizes the risk: costs of $10k-20k per vulnerability make it unlikely to be run broadly, and it's best viewed as a pentest add-on. A key insight is that Mythos succeeds largely because of oracles like AddressSanitizer that filter false positives — the same reason agentic AI coding works (type checkers, linters, test suites). Without oracles, LLM-based vulnerability finders drown in false positives. The author warns that AI tools won't fix the root causes of poor software security; real solutions require memory-safe languages, capability-based security, and slower, more deliberate development — not faster AI-assisted code generation.
29
19
Video
Philipp Lackner·6w
Chill out.
A measured take on AI hype in software development, arguing that the truth lies between extreme skepticism and uncritical belief in AI CEO predictions. Developers are encouraged to adopt one AI coding tool (Claude Code is personally recommended) and master it rather than chasing every new model or benchmark. AI is genuinely changing software development but is far from autonomously replacing developers, especially in complex enterprise contexts. The advice: engage with agentic coding workflows, but don't let FOMO drive anxiety-driven tool-hopping.
28
5
20
Article
Tighten·8w
Why Developers Should – and Shouldn’t – Use LLMs in Our Development
A pragmatic look at when developers should and shouldn't use LLMs in their workflows. The 'should' cases include offloading repetitive tasks, prototyping ideas faster, learning new tech stacks, and getting a simulated code review when working solo. The 'shouldn't' cases cover environmental costs of AI hyperscaling, security risks from AI-generated code lacking architectural judgment, questionable productivity gains (including a METR study showing 19% slower task completion with AI), skill atrophy for junior and senior devs alike, ecosystem instability and vendor lock-in, and the psychological toll of blurred work-life boundaries. The core argument: AI shifts where developers deploy their expertise but doesn't replace the need for it.
26
2
21
Article
Faun·4w
Qwen3.6–35B-A3B: The Most Practical Open-Source AI Model Yet?
Qwen3.6-35B-A3B is a Mixture-of-Experts open-source model with 35B total parameters but only ~3B active per request, making it highly efficient. It features a 262K context window (extendable to 1M with YaRN), multimodal support (text, image, video), and an Apache 2.0 license. The model is designed for agentic coding workflows, achieving top scores on SWE-bench Verified (73.4), Terminal-Bench 2.0 (51.5), and strong STEM reasoning benchmarks. Key architectural innovations include Gated DeltaNet linear attention and Grouped Query Attention (GQA). It supports a switchable thinking/non-thinking mode and a new thinking preservation feature that reuses reasoning across conversation turns. Deployment is supported via vLLM, SGLang, KTransformers, and Hugging Face.
76
6
22
Article
Where's Your Ed At·4w
AI's Economics Don't Make Sense
A detailed critique of the economics underpinning generative AI, arguing that subscription-based pricing for LLM services was fundamentally deceptive and unsustainable. GitHub Copilot's shift to token-based billing is used as a case study showing that AI companies have been subsidizing massive compute costs for years, training users to consume far more than their subscriptions cover. The piece breaks down the broken unit economics of AI data centers (using a 100MW theoretical model and Stargate Abilene as examples), estimates that $156.8B in annual compute revenue is needed just for data centers currently under construction, and argues that OpenAI and Anthropic have no credible path to profitability. The author contends that hiding true token costs from users was a deliberate strategy to grow adoption, and that the transition to usage-based billing will expose just how expensive and often unjustifiable AI tooling really is.
68
11
23
Article
Product Hunt·5w
Free open-source GEO tracker for LLM visibility
OneGlanse is a free, open-source GEO (Generative Engine Optimization) tracker that monitors how a brand appears in AI-generated responses across ChatGPT, Gemini, Perplexity, Claude, and Google AI Overview. It uses real UI outputs rather than APIs, supports competitor comparison and source analysis, and can be run locally or self-hosted so data never leaves your control. No subscriptions or opaque scoring systems.
25
24
Article
portkey·6w
What is AIOps?
AIOps for LLM systems addresses the gap between traditional infrastructure monitoring and the operational needs of production AI. Standard monitoring confirms systems are running but misses output drift, cost spikes, and request-level failures. AIOps introduces a control layer between applications and model providers that enables end-to-end request tracing, runtime routing and policy enforcement, proactive cost controls, and governance with full auditability. Practical implementation involves a gateway that intercepts every request, applies routing rules, enforces usage limits, and logs full execution context. Teams benefit from faster debugging, predictable costs, and consistent model behavior.
24
3
25
Article
Hugging Face·4w
DeepSeek-V4: a million-token context that agents can actually use
DeepSeek-V4 is a new frontier open model designed specifically for long-running agentic workloads. It introduces a hybrid attention architecture combining Compressed Sparse Attention (CSA, 4x compression) and Heavily Compressed Attention (HCA, 128x compression), reducing KV cache memory to roughly 2% of standard grouped query attention. V4-Pro requires only 27% of the single-token inference FLOPs of V3.2, and V4-Flash drops to 10%. Key agent-specific improvements include preserved reasoning traces across tool-call boundaries and user turns, a new XML-based tool-call schema with dedicated tokens to reduce parsing failures, and a Rust-based sandbox infrastructure (DSec) used for RL training against real tool environments. On agent benchmarks, V4-Pro-Max reaches 80.6 on SWE Verified, 73.6 on MCPAtlas, and 67.9 on Terminal Bench 2.0, placing it at parity with frontier closed models. Four model checkpoints (Pro and Flash, instruct and base) are available on Hugging Face Hub.
27
2

See all LLM archives