Whisper, a speech recognition model trained on 680,000 hours of audio taken from the web, demonstrates the problems with Big Tech and data theft.

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

Whisper is a speech recognition model by OpenAI trained on 680,000 hours of audio. It can transcribe and translate in multiple languages. Despite its groundbreaking capabilities, Whisper didn't receive much attention due to the focus on generative AI models. The use of indigenous data in Whisper raises concerns regarding data extraction and sovereignty. The post argues that open sourcing models like Whisper primarily benefits those already in the tech industry, neglecting indigenous and marginalized communities. The decision of whether to open source such models should be made by the communities from whom the data was collected.

OpenAI's Whisper is another case study in Colonisation