Convolution (2D)

Category: Deep Learning
Difficulty: Intermediate
Time Complexity: O(H × W × K² × C)
Space Complexity: O(H × W + K²)

Overview

Convolution is the core operation in Convolutional Neural Networks (CNNs). A small matrix called a kernel (or filter) slides across a 2D input (such as an image), computing the element-wise product between the kernel and each overlapping patch of the input, then summing the products to produce a single output value. This process generates an output feature map that highlights where certain features (edges, textures, patterns) appear in the input. The key insight of convolution is weight sharing — the same kernel weights are reused across all spatial positions, dramatically reducing the number of parameters compared to a fully connected layer. Convolution was popularized by LeCun et al. (1989) for handwritten digit recognition and remains the backbone of modern computer vision.

Try It

Web: Open in Eigenvue →

Python:

import eigenvue
eigenvue.show("convolution")

Default Inputs

{
  "input": [
    [
      1,
      2,
      3,
      0
    ],
    [
      4,
      5,
      6,
      1
    ],
    [
      7,
      8,
      9,
      2
    ],
    [
      0,
      1,
      2,
      3
    ]
  ],
  "kernel": [
    [
      1,
      0
    ],
    [
      0,
      -1
    ]
  ]
}

Input Examples

Default (4×4 input, 2×2 kernel)

{
  "input": [
    [
      1,
      2,
      3,
      0
    ],
    [
      4,
      5,
      6,
      1
    ],
    [
      7,
      8,
      9,
      2
    ],
    [
      0,
      1,
      2,
      3
    ]
  ],
  "kernel": [
    [
      1,
      0
    ],
    [
      0,
      -1
    ]
  ]
}

Edge detection (3×3 kernel)

{
  "input": [
    [
      0,
      0,
      0,
      0,
      0
    ],
    [
      0,
      1,
      1,
      1,
      0
    ],
    [
      0,
      1,
      1,
      1,
      0
    ],
    [
      0,
      1,
      1,
      1,
      0
    ],
    [
      0,
      0,
      0,
      0,
      0
    ]
  ],
  "kernel": [
    [
      -1,
      -1,
      -1
    ],
    [
      -1,
      8,
      -1
    ],
    [
      -1,
      -1,
      -1
    ]
  ]
}

Code

Pseudocode

function convolve2D(input, kernel):
  H, W = dimensions(input)
  kH, kW = dimensions(kernel)
  outH = H - kH + 1
  outW = W - kW + 1
  output = zeros(outH, outW)

  for row in range(outH):
    for col in range(outW):
      sum = 0
      for kr in range(kH):
        for kc in range(kW):
          sum += input[row+kr][col+kc] * kernel[kr][kc]
      output[row][col] = sum
  return output

Key Concepts

Kernel Sliding

The kernel slides across the input one position at a time (stride=1). At each position, it computes the dot product between the kernel and the overlapping input patch. This produces one value in the output feature map.

The same kernel weights are applied at every position in the input. This means the network detects the same feature regardless of where it appears — a property called translation equivariance.

Feature Maps

Each kernel produces one feature map. Using multiple kernels (channels) allows the network to detect different features (edges, textures, shapes) simultaneously.

Common Pitfalls

Output size reduction: Without padding, convolution reduces spatial dimensions: output_size = input_size - kernel_size + 1. Use padding=‘same’ to maintain dimensions.
Kernel size must be ≤ input size: The kernel cannot be larger than the input in any dimension. This is a hard constraint of the convolution operation.

Quiz

Q1: What is the output size of convolving a 5×5 input with a 3×3 kernel (no padding, stride 1)?

A) 5×5
B) 4×4
C) 3×3
D) 2×2

Show answer

Answer: C) 3×3

Output size = input_size - kernel_size + 1 = 5 - 3 + 1 = 3. So the output is 3×3.

Prerequisites

Single Neuron / Perceptron