Convolution (2D)
Category: Deep Learning
Difficulty: Intermediate
Time Complexity: O(H × W × K² × C)
Space Complexity: O(H × W + K²)
Overview
Section titled “Overview”Convolution is the core operation in Convolutional Neural Networks (CNNs). A small matrix called a kernel (or filter) slides across a 2D input (such as an image), computing the element-wise product between the kernel and each overlapping patch of the input, then summing the products to produce a single output value. This process generates an output feature map that highlights where certain features (edges, textures, patterns) appear in the input. The key insight of convolution is weight sharing — the same kernel weights are reused across all spatial positions, dramatically reducing the number of parameters compared to a fully connected layer. Convolution was popularized by LeCun et al. (1989) for handwritten digit recognition and remains the backbone of modern computer vision.
Try It
Section titled “Try It”- Web: Open in Eigenvue →
- Python:
import eigenvueeigenvue.show("convolution")
Default Inputs
Section titled “Default Inputs”{ "input": [ [ 1, 2, 3, 0 ], [ 4, 5, 6, 1 ], [ 7, 8, 9, 2 ], [ 0, 1, 2, 3 ] ], "kernel": [ [ 1, 0 ], [ 0, -1 ] ]}Input Examples
Section titled “Input Examples”Default (4×4 input, 2×2 kernel)
Section titled “Default (4×4 input, 2×2 kernel)”{ "input": [ [ 1, 2, 3, 0 ], [ 4, 5, 6, 1 ], [ 7, 8, 9, 2 ], [ 0, 1, 2, 3 ] ], "kernel": [ [ 1, 0 ], [ 0, -1 ] ]}Edge detection (3×3 kernel)
Section titled “Edge detection (3×3 kernel)”{ "input": [ [ 0, 0, 0, 0, 0 ], [ 0, 1, 1, 1, 0 ], [ 0, 1, 1, 1, 0 ], [ 0, 1, 1, 1, 0 ], [ 0, 0, 0, 0, 0 ] ], "kernel": [ [ -1, -1, -1 ], [ -1, 8, -1 ], [ -1, -1, -1 ] ]}Pseudocode
Section titled “Pseudocode”function convolve2D(input, kernel): H, W = dimensions(input) kH, kW = dimensions(kernel) outH = H - kH + 1 outW = W - kW + 1 output = zeros(outH, outW)
for row in range(outH): for col in range(outW): sum = 0 for kr in range(kH): for kc in range(kW): sum += input[row+kr][col+kc] * kernel[kr][kc] output[row][col] = sum return outputKey Concepts
Section titled “Key Concepts”Kernel Sliding
Section titled “Kernel Sliding”The kernel slides across the input one position at a time (stride=1). At each position, it computes the dot product between the kernel and the overlapping input patch. This produces one value in the output feature map.
Weight Sharing
Section titled “Weight Sharing”The same kernel weights are applied at every position in the input. This means the network detects the same feature regardless of where it appears — a property called translation equivariance.
Feature Maps
Section titled “Feature Maps”Each kernel produces one feature map. Using multiple kernels (channels) allows the network to detect different features (edges, textures, shapes) simultaneously.
Common Pitfalls
Section titled “Common Pitfalls”- Output size reduction: Without padding, convolution reduces spatial dimensions: output_size = input_size - kernel_size + 1. Use padding=‘same’ to maintain dimensions.
- Kernel size must be ≤ input size: The kernel cannot be larger than the input in any dimension. This is a hard constraint of the convolution operation.
Q1: What is the output size of convolving a 5×5 input with a 3×3 kernel (no padding, stride 1)?
- A) 5×5
- B) 4×4
- C) 3×3
- D) 2×2
Show answer
Answer: C) 3×3
Output size = input_size - kernel_size + 1 = 5 - 3 + 1 = 3. So the output is 3×3.
Further Reading
Section titled “Further Reading”- Backpropagation Applied to Handwritten Zip Code Recognition (LeCun et al., 1989) (paper)
- Convolution — Wikipedia (article)