Best of Hugging Face2025

  1. 1
    Article
    Avatar of huggingfaceHugging Face·48w

    Tiny Agents in Python: a MCP-powered agent in ~70 lines of code

    The post introduces a method to create MCP-powered agents in Python, highlighting a simplified setup for integrating external tools with large language models (LLMs). By using the Model Context Protocol (MCP), these agents can easily interact with various tools without custom integration. The guide details the setup and execution of such agents using the huggingface_hub, showcasing potential use cases and possible configurations. It emphasizes the role of the MCPClient in facilitating asynchronous connections to MCP servers, tool discovery, and execution.

  2. 2
    Article
    Avatar of huggingfaceHugging Face·52w

    Tiny Agents: a MCP-powered agent in 50 lines of code

    Discover how to implement a small and powerful AI agent using Model Context Protocol (MCP) in just 50 lines of code. The post covers the integration of MCP with large language models (LLMs) to create agentic AI, featuring JavaScript and TypeScript components with Hugging Face's SDKs and tools. It also demonstrates the use of MCP servers and shows how tools can be utilized within an LLM inference client.

  3. 3
    Article
    Avatar of huggingfaceHugging Face·1y

    The NLP Course is becoming the LLM Course!

    Hugging Face is upgrading its NLP course by renaming it to the LLM course, reflecting the latest advancements in AI. The revamped course will include new chapters on fine-tuning LLMs and building reasoning models, alongside maintaining and updating existing NLP content. The goal is to make cutting-edge research accessible and community-driven, with interactive exercises and live sessions available where beneficial.

  4. 4
    Article
    Avatar of huggingfaceHugging Face·1y

    FastRTC: The Real-Time Communication Library for Python

    FastRTC is a new real-time communication library for Python designed to simplify the building of real-time audio and video AI applications. It supports features such as automatic voice detection, WebRTC-enabled Gradio UI, and integration capabilities with FastAPI. The library also includes utilities for text-to-speech, speech-to-text, and other key functionalities, making it easy to develop and deploy real-time applications.

  5. 5
    Article
    Avatar of huggingfaceHugging Face·35w

    MCP for Research: How to Connect AI to Research Tools

    Model Context Protocol (MCP) enables AI systems to automate academic research discovery by connecting to tools that search across platforms like arXiv, GitHub, and Hugging Face. The approach progresses through three abstraction layers: manual research, scripted automation, and AI-orchestrated natural language workflows. MCP allows researchers to use natural language requests to gather comprehensive information about papers, implementations, and related resources, though it requires human oversight for quality control.

  6. 6
    Article
    Avatar of huggingfaceHugging Face·46w

    ScreenSuite - The most comprehensive evaluation suite for GUI Agents!

    ScreenSuite is a comprehensive evaluation framework for GUI agents that unifies 13 benchmarks across perception, grounding, single-step actions, and multi-step agent capabilities. The suite evaluates vision language models on their ability to interact with graphical interfaces using only visual input, without accessibility trees or DOM metadata. It includes Dockerized environments for Ubuntu and Android testing, supports both local and remote sandbox execution, and provides standardized evaluation of leading VLMs like Qwen-2.5-VL series, UI-TARS, and GPT-4o on GUI automation tasks.

  7. 7
    Article
    Avatar of huggingfaceHugging Face·1y

    Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC

    Hugging Face and Cloudflare have partnered to provide AI developers with seamless real-time speech and video capabilities using FastRTC. This collaboration allows developers to use enterprise-grade WebRTC infrastructure with minimal setup and reliable global connectivity via Cloudflare's TURN servers. AI developers can now access 10GB of free data streaming per month using a Hugging Face token and easily scale using Cloudflare's network.

  8. 8
    Article
    Avatar of huggingfaceHugging Face·38w

    Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face

    Hugging Face introduces Trackio, a lightweight open-source Python library for machine learning experiment tracking. It offers wandb-compatible API, local-first approach with optional Hugging Face Spaces hosting, easy sharing via URLs and iframes, and built-in GPU energy usage tracking. The library integrates seamlessly with Transformers and Accelerate, stores data in SQLite with Parquet backups, and provides free hosting on Hugging Face Spaces with both public and private options.

  9. 9
    Article
    Avatar of huggingfaceHugging Face·20w

    Transformers v5: Simple model definitions powering the AI ecosystem

    Hugging Face releases Transformers v5, marking five years since v4 with daily installs growing from 20,000 to 3 million. The library now supports over 400 model architectures and 750,000 community checkpoints. Version 5 focuses on simplicity through modular design, improved training support for both pre-training and fine-tuning, enhanced inference capabilities with continuous batching and a new serving API, and first-class quantization support. The release emphasizes interoperability across the ecosystem, enabling seamless integration with inference engines like vLLM and SGLang, local deployment tools like llama.cpp and MLX, and training frameworks like Unsloth and Axolotl.

  10. 10
    Article
    Avatar of huggingfaceHugging Face·25w

    huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning

    The huggingface_hub Python library has reached v1.0 after five years of development, now powering 200,000 dependent libraries and providing access to over 2 million models, 500,000 datasets, and 1 million Spaces. Major changes include migration from requests to httpx for modern HTTP infrastructure, a redesigned CLI replacing huggingface-cli with expanded features, and full adoption of hf_xet for file transfers with chunk-level deduplication. The release removes legacy patterns like the Git-based Repository class while maintaining backward compatibility for most ML libraries, though transformers v5 will be required for full v1.x support.

  11. 11
    Article
    Avatar of huggingfaceHugging Face·32w

    Jupyter Agents: training LLMs to reason with notebooks

    Hugging Face developed Jupyter Agent, a system that trains small language models to perform data science tasks by executing code in Jupyter notebooks. They created a comprehensive pipeline starting with 2TB of Kaggle notebooks, applied deduplication and quality filtering, generated synthetic question-answer pairs, and fine-tuned Qwen3-4B models. The approach achieved 75% accuracy on easy DABStep benchmark tasks, demonstrating that smaller models can become effective data science agents with proper training data and scaffolding. The project includes open-source datasets, trained models, and a simplified 200-line scaffolding system.

  12. 12
    Article
    Avatar of huggingfaceHugging Face·30w

    Gaia2 and ARE: Empowering the community to study agents

    Hugging Face introduces Gaia2, an advanced AI agent benchmark that goes beyond read-only tasks to evaluate interactive behaviors in real-world conditions. Unlike its predecessor GAIA, Gaia2 tests agents on complex scenarios including ambiguity handling, time-sensitive actions, and noise tolerance using a smartphone mock-up environment. The release includes the open-source Agent Research Environments (ARE) framework for running, debugging, and evaluating agents with structured trace recording. Current results show GPT-5 as the top performer, while temporal reasoning remains challenging for all models. The platform enables researchers to create custom scenarios and connect their own tools via MCP integration.

  13. 13
    Article
    Avatar of huggingfaceHugging Face·51w

    Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs

    AutoRound is Intel's advanced post-training quantization tool for large language and vision-language models, designed to reduce model size and inference latency while maintaining high accuracy. It utilizes signed gradient descent to optimize weight rounding and clipping ranges for low-bit quantization (e.g., INT2 - INT8) with minimal accuracy loss. The tool supports a variety of model architectures and devices, and offers fast quantization processes with just a small calibration dataset needed. AutoRound is compatible with popular export formats and provides flexibility in quantization configurations.

  14. 14
    Article
    Avatar of huggingfaceHugging Face·1y

    How to deploy and fine-tune DeepSeek models on AWS

    Learn how to deploy and fine-tune DeepSeek R1 models using Hugging Face on AWS services. The guide covers deployment on AWS with Hugging Face Inference Endpoints and Amazon Sagemaker AI, including both GPU and Neuron instances. Additionally, it provides code snippets for deployment and emphasizes the benefits of using these platforms such as simplified infrastructure management and cost savings.

  15. 15
    Article
    Avatar of huggingfaceHugging Face·29w

    SOTA OCR with Core ML and dots.ocr

    A detailed walkthrough of converting the dots.ocr model (a 3B parameter OCR model from RedNote) to run on Apple devices using Core ML and MLX. The guide covers the conversion process from PyTorch to Core ML, including simplifying the model architecture, debugging common conversion errors, and initial benchmarking. Key challenges addressed include handling attention implementations, fixing dtype mismatches, removing dynamic control flow, and dealing with variable-length sequence masking. The converted model initially runs on GPU in FLOAT32 precision, with future parts promising Neural Engine optimization and quantization techniques.

  16. 16
    Article
    Avatar of huggingfaceHugging Face·29w

    VibeGame: Exploring Vibe Coding Games

    VibeGame is a new high-level declarative game engine built on three.js, designed specifically for AI-assisted game development. The author explores the challenges of 'vibe coding' games using AI, comparing platforms like Roblox, Unity, and web technologies. The solution combines web stack's excellent AI performance with high-level abstractions similar to Roblox, using XML-like syntax and Entity-Component-System architecture. While effective for basic games, it currently struggles with complex features like multiplayer and combat systems.

  17. 17
    Article
    Avatar of huggingfaceHugging Face·1y

    LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone!

    This guide demonstrates how to run large language models (LLMs) on mobile devices using React Native. It walks through the creation of a mobile app that allows users to chat with AI models locally, ensuring privacy and offline functionality. The tutorial also covers choosing the right model sizes, understanding GGUF quantization formats, setting up the React Native environment, and implementing features such as a chat interface, model downloading, and state management. Additional advanced features like generation on the fly, auto-scrolling, and inference speed tracking are also discussed.

  18. 18
    Article
    Avatar of huggingfaceHugging Face·41w

    Upskill your LLMs with Gradio MCP Servers

    The Model Context Protocol (MCP) enables developers to extend Large Language Models with specialized tools and capabilities. Gradio apps on Hugging Face Spaces now support MCP, creating an "app store" of thousands of AI-powered tools that can be connected to LLMs. The post demonstrates how to integrate the Flux.1 Kontext image editing model as an MCP server with Cursor, allowing the LLM to edit images from text prompts. This approach transforms LLMs from simple question-answering systems into powerful assistants with diverse capabilities like image editing, web browsing, and data processing.

  19. 19
    Article
    Avatar of huggingfaceHugging Face·44w

    (LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware

    QLoRA enables fine-tuning of FLUX.1-dev diffusion models on consumer hardware with under 10GB VRAM by combining 4-bit quantization with Low-Rank Adaptation. The approach uses bitsandbytes for quantization, 8-bit AdamW optimizer, gradient checkpointing, and cached latents to dramatically reduce memory usage from ~120GB to ~9GB. Training on RTX 4090 takes 41 minutes for 700 steps, while FP8 training with torchao on H100 reduces time to 20 minutes. The technique maintains high-quality results while making advanced model customization accessible to developers without enterprise-grade hardware.

  20. 20
    Article
    Avatar of huggingfaceHugging Face·49w

    The Transformers Library: standardizing model definitions

    The Transformers library aims to be the central hub for model architectures across various frameworks, supporting over 300 models with consistent updates. It integrates with major training frameworks and inference engines, offering significant interoperability and efficiency. Efforts are underway to simplify model definitions and contributions to reduce complexity for model creators, enhancing ecosystem standardization.

  21. 21
    Article
    Avatar of huggingfaceHugging Face·49w

    Vision Language Models (Better, Faster, Stronger)

    This post reviews the developments in vision language models over the past year, highlighting new model architectures, specialized capabilities, and emerging paradigms. It covers trends such as any-to-any models, reasoning models, smaller yet capable models, and multimodal safety models, offering insights into how these innovations are shaping the future of AI.