We’re on a journey to advance and democratize artificial intelligence through open source and open science.

HuggingFace's platform is a resource for developers and researchers working in natural language processing (NLP) and machine learning, offering insights into NLP models, tools, and datasets. Through articles, tutorials, and open-source projects, HuggingFace offers insights into state-of-the-art NLP techniques, transformer architectures, and transfer learning methods. Developers can learn about using pre-trained models, fine-tuning strategies, and deploying NLP applications with HuggingFace's libraries and APIs.

Hugging Face

The Open ASR Leaderboard is adding private, high-quality English ASR datasets from Appen Inc. and DataoceanAI to combat benchmaxxing and test-set contamination. These datasets cover scripted and conversational speech across multiple accents (American, British, Australian, Canadian, Indian). The private data is kept hidden from model developers to prevent gaming the benchmark, and by default the leaderboard's Average WER is still computed on public datasets only — users can toggle private datasets on to see their impact. Aggregate metrics are provided (scripted vs. conversational, US vs. non-US accents) without per-split scores to prevent targeted optimization. Models can be submitted via GitHub pull request for evaluation on both public and private sets.

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

How can I evaluate my model on this data?

Do models trained on the data providers have an advantage?