I’ve been spending several hundred bucks renting GPU instances on AWS over the last year. The speedup from a GPU is awesome and hard to deny. GPUs have taken over the field. Maybe following the footsteps of Bitcoin mining there’s some research on using FPGA (I know very little about this).

Erik Bernhardsson

A critique of GPU-accelerated deep learning from 2016, arguing that GPUs excel at dense, parallelizable operations like convolutional neural networks but struggle with logarithmic data structures required for techniques like hierarchical softmax in word2vec. The author suggests that for NLP and collaborative filtering at billion-parameter scale, CPU-friendly sparse/logarithmic architectures may outperform brute-force GPU approaches, and proposes hybrid CPU+GPU architectures as a promising research direction.

My issue with GPU-accelerated deep learning