The Open ASR Leaderboard is adding private, high-quality English ASR datasets from Appen Inc. and DataoceanAI to combat benchmaxxing and test-set contamination. These datasets cover scripted and conversational speech across multiple accents (American, British, Australian, Canadian, Indian). The private data is kept hidden from model developers to prevent gaming the benchmark, and by default the leaderboard's Average WER is still computed on public datasets only — users can toggle private datasets on to see their impact. Aggregate metrics are provided (scripted vs. conversational, US vs. non-US accents) without per-split scores to prevent targeted optimization. Models can be submitted via GitHub pull request for evaluation on both public and private sets.

7m read timeFrom huggingface.co
Post cover image
Table of contents
New high-quality, private datasetsHow can I evaluate my model on this data?Do models trained on the data providers have an advantage?What's next?

Sort: