HuggingFace's platform is a resource for developers and researchers working in natural language processing (NLP) and machine learning, offering insights into NLP models, tools, and datasets. Through articles, tutorials, and open-source projects, HuggingFace offers insights into state-of-the-art NLP techniques, transformer architectures, and transfer learning methods. Developers can learn about using pre-trained models, fine-tuning strategies, and deploying NLP applications with HuggingFace's libraries and APIs.

Hugging Face

llama.cpp server now supports router mode, enabling dynamic loading, unloading, and switching between multiple LLM models without restarting. The feature auto-discovers GGUF models from cache or custom directories, loads them on-demand, and uses LRU eviction when hitting the concurrent model limit (default: 4). Each model runs in its own process for isolation. The server provides OpenAI-compatible HTTP endpoints for chat completions and model management operations.

New in llama.cpp: Model Management