Hugging Face

Llama 3.2, developed in collaboration with Meta and available on Hugging Face, includes both multimodal vision models and text-only models. The Vision models come in 11B and 90B sizes and feature strong visual reasoning capabilities. Text-only models are available in 1B and 3B sizes, optimized for on-device use. Llama 3.2 also introduces a new version of Llama Guard for input classification, including harmful prompt detection. Integration with Hugging Face Transformers and major cloud services is supported, and fine-tuning can be accomplished with a single GPU.

Llama can now see and run on your device - welcome Llama 3.2

As large language models (LLMs) grow, reducing their computational and energy costs via quantization becomes crucial. BitNet, a new transformer architecture from Microsoft Research, drastically cuts computational costs by representing parameters with ternary values (-1, 0, 1) at 1.58 bits per parameter. The post details how existing models, like Llama3, can be fine-tuned using BitNet, achieving efficient performance while maintaining accuracy. The article also covers the implementation, optimization, and benchmarking of custom inference kernels, making LLMs more scalable and practical.

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

HuggingFace's platform is a resource for developers and researchers working in natural language processing (NLP) and machine learning, offering insights into NLP models, tools, and datasets. Through articles, tutorials, and open-source projects, HuggingFace offers insights into state-of-the-art NLP techniques, transformer architectures, and transfer learning methods. Developers can learn about using pre-trained models, fine-tuning strategies, and deploying NLP applications with HuggingFace's libraries and APIs.

Best of Hugging Face — September 2024