Best of LLMAugust 2025

  1. 1
    Article
    Avatar of collectionsCollections·36w

    DeepSeek-V3.1 Release: A New Era in Open AI Technology

    DeepSeek has released V3.1, a 685-billion parameter open-source language model that rivals proprietary systems from OpenAI and Anthropic at significantly lower costs. The model features a hybrid architecture combining chat, reasoning, and coding capabilities, supports 128,000 token context, and achieves 71.6% on coding benchmarks. Available for free on Hugging Face with API compatibility, it's optimized for Chinese chips and represents a major step toward democratizing advanced AI technology.

  2. 2
    Article
    Avatar of controversycontroversy.dev·39w

    Enough is enough. Prompt engineering is not engineering.

    Argues that prompt engineering is fundamentally different from traditional software engineering, lacking the systematic design, mathematical rigor, and testable logic that define real engineering disciplines. The author contends that calling prompt writing 'engineering' is misleading marketing that inflates the perceived technical complexity of working with AI language models.

  3. 3
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·37w

    8 RAG Architectures for AI Engineers

    Eight different RAG (Retrieval-Augmented Generation) architectures are explained with their specific use cases: Simple Vector RAG for basic semantic matching, Multi-modal RAG for cross-modal retrieval, HyDE for handling dissimilar queries, Self-RAG for validation against trusted sources, Graph RAG for structured relationships, Hybrid RAG combining vector and graph approaches, Adaptive RAG for dynamic query handling, and Agentic RAG for complex workflows with AI agents.

  4. 4
    Article
    Avatar of bytebytegoByteByteGo·36w

    How LLMs See Images, Audio, and More

    Modern AI systems process images, audio, and video by converting them into discrete tokens, similar to text processing. Images use patch embeddings (dividing into grid squares), vector quantization (learning visual codebooks), or contrastive embeddings. Audio employs neural codecs for quality preservation, ASR transcription for semantic content, or hierarchical approaches for multi-scale representation. Each tokenization method involves trade-offs between computational efficiency, information preservation, and semantic understanding, with the optimal choice depending on specific use cases and requirements.

  5. 5
    Article
    Avatar of hnHacker News·38w

    microsoft/poml: Prompt Orchestration Markup Language

    POML (Prompt Orchestration Markup Language) is Microsoft's new markup language for structured prompt engineering with Large Language Models. It features HTML-like syntax with semantic components, comprehensive data handling for various file types, CSS-like styling system, built-in templating engine, and development tools including VS Code extension and SDKs for Node.js and Python. POML aims to solve common prompt development challenges by providing better structure, maintainability, and data integration capabilities.

  6. 6
    Article
    Avatar of medium_jsMedium·36w

    5 Agent Workflows You Need to Master (And Exactly How to Use Them)

    Five structured AI agent workflows are presented to replace ad-hoc prompting: prompt chaining breaks complex tasks into sequential steps, routing directs queries to appropriate models based on complexity, parallelization runs independent tasks simultaneously, orchestrator-workers use a planning model to coordinate specialized workers, and evaluator-optimizer creates feedback loops for quality improvement. Each workflow includes Python code examples and addresses specific use cases like code generation, content creation, and data analysis to achieve more consistent and production-ready results.

  7. 7
    Article
    Avatar of zedZed·37w

    Why LLMs Can't Really Build Software — Zed's Blog

    Large Language Models excel at writing code but struggle with the iterative mental modeling that defines effective software engineering. While LLMs can generate code and update it when given specific problems, they cannot maintain clear mental models of requirements versus implementation, leading to confusion when tests fail or debugging is needed. Current models suffer from context omission, recency bias, and hallucination issues that prevent them from understanding complex software systems. For non-trivial projects, human engineers must remain in control, using LLMs as tools while maintaining responsibility for requirements clarity and code verification.

  8. 8
    Article
    Avatar of matkladmatklad·35w

    Vibe Coding Terminal Editor

    A developer shares practical lessons learned from building a VS Code terminal editor extension using Claude AI. Key insights include using a plan.md workflow for structured LLM interactions, treating specs/code/tests as interchangeable formats, and the critical importance of fast, end-to-end testing for AI-assisted development. The author emphasizes that while LLMs excel at code generation, they require proper feedback loops and human oversight in the control plane rather than data plane.

  9. 9
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·36w

    JSON prompting for LLMs

    JSON prompting improves LLM outputs by providing structured format instead of vague natural language instructions. This technique leverages AI models' training on structured data from APIs and web applications, resulting in more consistent and predictable responses. JSON prompts eliminate ambiguity, enable output control, and create reusable templates for scalable AI workflows. While JSON is effective, alternatives like XML for Claude and Markdown also work well - the key is structure rather than specific syntax.

  10. 10
    Article
    Avatar of infoworldInfoWorld·36w

    Is the generative AI bubble about to burst?

    The generative AI boom shows similarities to the dotcom bubble, with massive investments ($364 billion expected in 2025) flowing primarily to companies like Nvidia. While Goldman Sachs argues current AI investments are justified by profits, critics point to structural limitations in large language models that prevent true reasoning capabilities. Developers using AI tools daily recognize their utility for code generation but also experience their shortcomings, suggesting the technology may be more incremental than revolutionary. Even if an AI bubble exists, survivors will likely drive lasting changes in the industry, similar to how some dotcom survivors became today's tech giants.

  11. 11
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·35w

    Build a 100% local MCP Server and Client

    Learn to build a completely local Model Context Protocol (MCP) server and client setup for enterprise-grade AI applications. The tutorial covers creating MCP servers using FastMCP, building secure local clients with mcp-use library, and integrating with Stagehand for browser automation. This approach keeps data on your own servers while enabling AI agents to perform tasks like web scraping and form filling through natural language commands.

  12. 12
    Article
    Avatar of meilisearchMeilisearch·36w

    9 advanced RAG techniques to know & how to implement them

    Advanced RAG techniques optimize retrieval-augmented generation systems beyond basic implementations. Nine key techniques include text chunking (semantic vs fixed-size), reranking with cross-encoders, metadata filtering, hybrid search combining keyword and vector methods, query rewriting for better intent understanding, autocut for dynamic text trimming, context distillation for focused summaries, and fine-tuning both LLMs and embedding models. These methods address common issues like noisy results, irrelevant context, and poor ranking. Implementation tools include Meilisearch for hybrid search, LangChain for workflow orchestration, Weaviate for vector search, and Pinecone for scalable vector databases. Evaluation focuses on retrieval accuracy, latency, precision-recall balance, and user satisfaction metrics.

  13. 13
    Article
    Avatar of devblogsDevBlogs·36w

    A C# Guide with Ollama

    GPT-OSS is OpenAI's first open-weight model since GPT-2, offering developers powerful AI capabilities without cloud dependency. The guide demonstrates how to build a local AI chat application using C# and Ollama, leveraging Microsoft.Extensions.AI libraries for unified abstractions. The 20B model runs on just 16GB of memory, making it accessible for local development. The tutorial covers setting up a console app, adding necessary NuGet packages, implementing streaming chat with conversation history, and preparing for advanced scenarios like function calling and agentic applications.

  14. 14
    Article
    Avatar of phProduct Hunt·38w

    SelfHostLLM: Calculate the GPU memory you need for LLM inference

    SelfHostLLM is a tool that helps developers calculate GPU memory requirements and maximum concurrent requests for self-hosted large language model inference. It supports popular models like Llama, Qwen, DeepSeek, and Mistral, allowing users to plan their AI infrastructure efficiently with custom configurations.

  15. 15
    Article
    Avatar of neontechNeon·37w

    Generate Apps Locally for Free: App.build Now Supports Open Source Models

    App.build now supports running open-source language models locally through Ollama, LMStudio, and OpenRouter, eliminating API costs and rate limits while maintaining data privacy. The platform enables developers to generate full-stack applications using local inference on consumer hardware like RTX 4090s or M4 MacBooks. While current open-source models lag behind closed alternatives for autonomous app generation, they're improving rapidly and offer viable alternatives for experimentation and prototyping without vendor dependencies.

  16. 16
    Article
    Avatar of joindevopsDevOps·37w

    kubewall: AI-Powered Kubernetes Dashboard

    kubewall is a free, open-source Kubernetes dashboard that integrates AI-powered troubleshooting capabilities. It supports multiple LLM providers including OpenAI, Claude, Gemini, Ollama, and others for cluster assistance. Key features include multi-cluster connectivity, live pod log streaming, manifest editing, and simple AI setup through provider selection and API key configuration.

  17. 17
    Article
    Avatar of do_communityDigitalOcean Community·35w

    Context Engineering: Moving Beyond Prompting in AI

    Context engineering is an advanced approach to working with large language models that goes beyond simple prompt crafting. It involves strategically managing the entire context window with curated information including task descriptions, examples, retrieved documents, conversation history, and external data. Unlike prompt engineering which focuses on clever single-line instructions, context engineering manages knowledge flow, memory systems, and information retrieval to build production-grade AI applications. The approach addresses context window limitations through techniques like chunking, filtering, and dynamic knowledge injection, making it essential for enterprise AI systems and autonomous agents that require consistent, accurate outputs.

  18. 18
    Article
    Avatar of meilisearchMeilisearch·36w

    10 Best RAG Tools and Platforms: Full Comparison [2025]

    A comprehensive comparison of 10 RAG tools and platforms for 2025, including Meilisearch, LangChain, RAGatouille, Verba, Haystack, Embedchain, LlamaIndex, MongoDB, Pinecone, and Vespa. Each tool is analyzed with key features, pricing, integrations, pros/cons based on user reviews, and ideal use cases. The guide covers open-source options, enterprise solutions, and search engine tools, providing selection criteria including retrieval methods, performance, scalability, integration ease, deployment options, cost considerations, and community support.

  19. 19
    Article
    Avatar of medium_jsMedium·35w

    GPT-5 System Prompt Leaked : 7 Prompt Engineering Tricks to learn

    Analysis of a leaked GPT-5 system prompt reveals seven key prompt engineering techniques including identity locking to prevent prompt injection, knowledge anchoring for temporal context, multimodal toggles for routing, personality injection for behavioral control, content safety as first-class instructions, self-denial of hidden mechanisms to prevent conspiracy theories, and dynamic retrieval gates for up-to-date information. The techniques demonstrate advanced strategies for building robust AI systems through careful prompt design rather than fine-tuning.

  20. 20
    Article
    Avatar of tdsTowards Data Science·37w

    LangGraph 101: Let’s Build A Deep Research Agent

    A comprehensive tutorial on building AI research agents using LangGraph, Google's open-source framework. Covers core concepts including graph-based workflow modeling with nodes and edges, state management for agent memory, structured outputs for reliable LLM responses, tool calling for web searches, conditional routing for decision-making, and parallel processing for concurrent operations. Uses Google's Deep Research Agent implementation as a practical example, demonstrating how to create agents that can autonomously search the web, evaluate results, and generate comprehensive reports with citations.

  21. 21
    Article
    Avatar of hnHacker News·38w

    synthetic-lab/octofriend: An open-source coding helper. Very friendly!

    Octofriend is an open-source coding assistant that works with multiple LLM providers including OpenAI, Anthropic, and local models. It features automatic error handling with custom-trained models, zero telemetry, multi-turn conversation support, and MCP server integration. The tool allows switching between models mid-conversation and includes configurable instruction files for project-specific rules.

  22. 22
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·35w

    Corrective RAG Agentic Workflow

    Corrective RAG (CRAG) enhances traditional RAG systems by adding a self-assessment step that evaluates retrieved document relevance before generating responses. The workflow searches documents, uses an LLM to assess context relevance, retains only relevant information, performs web search when needed, and aggregates context for final response generation. The implementation uses a tech stack including Firecrawl for web search, Milvus for vector storage, Beam for deployment, and LlamaIndex workflows for orchestration, with observability through CometML's Opik.

  23. 23
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·35w

    ​Build a YC job-finder Agentic workflow​

    Explores building AI agent workflows using Sim, a no-code framework for creating agentic systems. Demonstrates building a YC startup job finder connected to Telegram, while covering the evolution of AI agents from simple LLMs to sophisticated systems with reasoning, memory, and tool usage. Includes a comprehensive crash course covering agent fundamentals, multi-agent systems, memory types, and advanced patterns like ReAct and Planning.

  24. 24
    Article
    Avatar of dailydoseofdsDaily Dose of Data Science | Avi Chawla | Substack·35w

    4 Layers of Agentic AI Systems

    Agentic AI systems are built on four distinct layers: LLMs as the foundation providing tokenization and inference capabilities, AI Agents that add autonomous behavior through tool usage and reasoning, Agentic Systems that coordinate multiple agents through communication protocols and orchestration frameworks, and Agentic Infrastructure that ensures production readiness with observability, security, and scalability features. Each layer builds upon the previous one to create robust, enterprise-ready AI systems.

  25. 25
    Article
    Avatar of logrocketLogRocket·35w

    How to protect your AI agent from prompt injection attacks

    Six design patterns help protect LLM agents from prompt injection attacks: Action-Selector limits responses to predefined actions, Plan-Then-Execute creates fixed plans before processing untrusted data, LLM Map-Reduce isolates processing of malicious inputs, Dual LLM separates privileged and quarantined models, Code-Then-Execute generates programs in sandboxed environments, and Context-Minimization removes potentially harmful prompts from conversation history. Each pattern offers different trade-offs between security and functionality.