Best of Hugging FaceSeptember 2024

  1. 1
    Article
    Avatar of huggingfaceHugging Face·2y

    Llama can now see and run on your device - welcome Llama 3.2

    Llama 3.2, developed in collaboration with Meta and available on Hugging Face, includes both multimodal vision models and text-only models. The Vision models come in 11B and 90B sizes and feature strong visual reasoning capabilities. Text-only models are available in 1B and 3B sizes, optimized for on-device use. Llama 3.2 also introduces a new version of Llama Guard for input classification, including harmful prompt detection. Integration with Hugging Face Transformers and major cloud services is supported, and fine-tuning can be accomplished with a single GPU.

  2. 2
    Article
    Avatar of huggingfaceHugging Face·2y

    Fine-tuning LLMs to 1.58bit: extreme quantization made easy

    As large language models (LLMs) grow, reducing their computational and energy costs via quantization becomes crucial. BitNet, a new transformer architecture from Microsoft Research, drastically cuts computational costs by representing parameters with ternary values (-1, 0, 1) at 1.58 bits per parameter. The post details how existing models, like Llama3, can be fine-tuned using BitNet, achieving efficient performance while maintaining accuracy. The article also covers the implementation, optimization, and benchmarking of custom inference kernels, making LLMs more scalable and practical.