Backpropagation is a leaky abstraction that requires deep understanding to build effective neural networks. Common pitfalls include vanishing gradients with sigmoid activations, dying ReLU neurons that never activate, and exploding gradients in RNNs. A real-world example shows how a DQN implementation incorrectly clipped values instead of gradients, causing training bugs. Understanding the backward pass mechanics helps developers debug issues, choose proper initialization strategies, implement gradient clipping correctly, and avoid architectural mistakes that prevent learning.

8m read timeFrom karpathy.medium.com
Post cover image
Table of contents
Vanishing gradients on sigmoidsDying ReLUsExploding gradients in RNNsSpotted in the Wild: DQN ClippingIn conclusion

Sort: