A critique of GPU-accelerated deep learning from 2016, arguing that GPUs excel at dense, parallelizable operations like convolutional neural networks but struggle with logarithmic data structures required for techniques like hierarchical softmax in word2vec. The author suggests that for NLP and collaborative filtering at

2m read timeFrom erikbern.com
Post cover image

Sort: