Backpropagation Visualization — Step-by-Step | Eigenvue

About Backpropagation

Backpropagation (backward propagation of errors) is the algorithm that makes neural network training possible. After a forward pass computes the network's prediction, backpropagation computes the gradient of the loss function with respect to every weight in the network by applying the chain rule of calculus layer by layer, from output back to input. These gradients tell each weight how to adjust to reduce the loss. Combined with an optimizer like gradient descent, backpropagation enables the network to learn from data. Introduced by Rumelhart, Hinton, and Williams in 1986, backpropagation remains the foundation of all deep learning training.

Complexity Analysis

Time Complexity: O(Σ n_l × n_{l+1})
Space Complexity: O(Σ n_l × n_{l+1})
Difficulty: intermediate

Key Concepts

Chain Rule of Calculus

Backpropagation is an efficient application of the chain rule. To find how a weight in an early layer affects the loss, we multiply the local gradients along the path from the loss back to that weight: ∂Loss/∂w = ∂Loss/∂a × ∂a/∂z × ∂z/∂w.

Gradient Descent

Gradients indicate the direction of steepest increase. To minimize the loss, we move weights in the opposite direction of the gradient: w_new = w_old - η × ∂Loss/∂w, where η is the learning rate.

Forward Then Backward

Training requires two passes: a forward pass to compute predictions and loss, then a backward pass to compute gradients. Both passes have the same computational cost — O(number of weights).

Common Pitfalls

Learning rate too large

A learning rate that is too large causes weight updates to overshoot the minimum, potentially diverging. Too small means painfully slow convergence.

Vanishing gradients in deep networks

When using sigmoid or tanh activations, gradients shrink exponentially as they propagate backward through many layers, making early layers almost impossible to train.

Prerequisites

Understanding these algorithms first will help:

feedforward network