Principle:LaurentMazare Tch rs Transposed Convolution

Knowledge Sources	LaurentMazare_Tch_rs Dumoulin & Visin, 2016
Domains	Deep Learning, Computer Vision, Signal Processing
Last Updated	2026-02-08 00:00 GMT

Overview

Transposed convolution maps feature maps from lower to higher spatial resolution by applying the transpose of a convolution operation, serving as the learnable upsampling component in encoder-decoder architectures.

Description

Transposed convolution (sometimes imprecisely called "deconvolution") is the gradient operation of a standard convolution with respect to its input. While a standard convolution with stride greater than 1 reduces spatial dimensions, a transposed convolution increases spatial dimensions. This makes it the natural choice for the decoder or upsampling portion of architectures that need to produce outputs at higher resolution than their intermediate representations.

The operation works by inserting zeros between input elements (for stride > 1) and then applying a standard convolution with the transposed kernel. The kernel weights are learnable parameters, distinguishing transposed convolution from fixed upsampling methods like bilinear interpolation. This allows the network to learn task-specific upsampling behavior.

Key parameters that control the output spatial dimensions include:

Kernel size -- The spatial extent of the learnable filter
Stride -- Determines the upsampling factor; a stride of 2 approximately doubles spatial dimensions
Padding -- Controls how the input boundaries are handled
Output padding -- Resolves the ambiguity when multiple input sizes map to the same output under standard convolution

Transposed convolutions are widely used in image segmentation (mapping from features back to pixel-level predictions), generative models (producing images from latent vectors), and super-resolution (increasing image resolution).

Usage

Apply transposed convolution when:

Upsampling feature maps in decoder networks or generative architectures
The upsampling operation should be learnable rather than fixed
Building symmetric encoder-decoder architectures where each downsampling convolution has a corresponding upsampling layer
Producing dense per-pixel outputs from spatially reduced feature representations

Theoretical Basis

Relationship to Standard Convolution

If a standard convolution is represented as matrix multiplication $y = C x$ where $C$ is the convolution matrix, then the transposed convolution computes:

$x^{'} = C^{T} y$

This is not the inverse of convolution (it does not recover the original input), but rather the transpose of the linear mapping.

Output Size Calculation

For a transposed convolution with input size $i$ , kernel size $k$ , stride $s$ , padding $p$ , and output padding $o_{p}$ :

$o = (i - 1) \times s - 2 p + k + o_{p}$

This is the dual of the standard convolution output size formula:

$i = ⌊ \frac{o + 2 p - k}{s} ⌋ + 1$

Checkerboard Artifacts

When the kernel size is not evenly divisible by the stride, transposed convolutions can produce checkerboard artifacts -- regular patterns in the output caused by uneven overlap of the kernel at different output positions. This can be mitigated by choosing kernel sizes that are multiples of the stride, or by using resize-convolution (bilinear upsampling followed by standard convolution) as an alternative.

Multi-Dimensional Extension

Transposed convolution generalizes to 1D (temporal upsampling), 2D (spatial upsampling for images), and 3D (volumetric upsampling for video or medical imaging) in the same manner, with the output size formula applied independently along each dimension.

Related Pages

Implementation:LaurentMazare_Tch_rs_Conv_Transpose

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment