About Backpropagation
Backpropagation (backward propagation of errors) is the algorithm that makes neural network training possible. After a forward pass computes the network's prediction, backpropagation computes the gradient of the loss function with respect to every weight in the network by applying the chain rule of calculus layer by layer, from output back to input. These gradients tell each weight how to adjust to reduce the loss. Combined with an optimizer like gradient descent, backpropagation enables the network to learn from data. Introduced by Rumelhart, Hinton, and Williams in 1986, backpropagation remains the foundation of all deep learning training.
Complexity Analysis
- Time Complexity
- O(Σ n_l × n_{l+1})
- Space Complexity
- O(Σ n_l × n_{l+1})
- Difficulty
- intermediate
Key Concepts
Chain Rule of Calculus
Backpropagation is an efficient application of the chain rule. To find how a weight in an early layer affects the loss, we multiply the local gradients along the path from the loss back to that weight: ∂Loss/∂w = ∂Loss/∂a × ∂a/∂z × ∂z/∂w.
Gradient Descent
Gradients indicate the direction of steepest increase. To minimize the loss, we move weights in the opposite direction of the gradient: w_new = w_old - η × ∂Loss/∂w, where η is the learning rate.
Forward Then Backward
Training requires two passes: a forward pass to compute predictions and loss, then a backward pass to compute gradients. Both passes have the same computational cost — O(number of weights).
Common Pitfalls
Learning rate too large
A learning rate that is too large causes weight updates to overshoot the minimum, potentially diverging. Too small means painfully slow convergence.
Vanishing gradients in deep networks
When using sigmoid or tanh activations, gradients shrink exponentially as they propagate backward through many layers, making early layers almost impossible to train.