An exploration of using dimensionality reduction (matrix factorization) as an alternative to Twitter's DISCO approach for computing cosine similarities at scale. By factorizing a user-item matrix into low-dimensional vectors, cosine similarity calculations become trivial O(f) operations where f is a small number of dimensions. This also reduces noise in the data. The post recommends combining this with locality-sensitive hashing for finding similar pairs, and points to two key papers on scalable collaborative filtering for implicit feedback datasets.

3m read timeFrom erikbern.com
Post cover image

Sort: