A data-driven analysis of the pitch (fundamental frequency) of spoken language across dozens of languages and regional variants. Using ~100k scraped audio clips, FFT-based frequency analysis, and bootstrapped statistics, the author finds meaningful pitch differences between languages — Hungarian being among the lowest, Chinese among the highest. The post also explores within-language variation by speaker origin, a curious artifact revealing AC power frequencies (50Hz vs 60Hz) in audio recordings, and speculates on why native pitch may make foreign accents harder to shed. The methodology, including Butterworth bandpass filtering, FFT aggregation, and bootstrapping for uncertainty estimation, is explained in detail with code on GitHub.
Sort: