Explores the key differences between Pearson correlation and cosine similarity, two statistical measures for quantifying relationships between variables. While both are based on dot products, correlation performs double normalization (mean-centering and variance scaling) while cosine similarity only normalizes by magnitude. Through mathematical explanations and Python simulations, the post demonstrates that these measures can yield dramatically different results depending on data scaling and offsets. Correlation is recommended when measurement units are arbitrary or different, while cosine similarity is preferred when variables share meaningful units, particularly in machine learning applications with vector embeddings.

10m read timeFrom thepalindrome.org
Post cover image
Table of contents
The dot productPearson correlation: The doubly-normalized dot productCosine similarityCode simulations to build understandingSystematic comparison of correlation and cosine similarityWhen to use which?

Sort: