Principle:LaurentMazare Tch rs Transposed Convolution
| Knowledge Sources | |
|---|---|
| Domains | Deep Learning, Computer Vision, Signal Processing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Transposed convolution maps feature maps from lower to higher spatial resolution by applying the transpose of a convolution operation, serving as the learnable upsampling component in encoder-decoder architectures.
Description
Transposed convolution (sometimes imprecisely called "deconvolution") is the gradient operation of a standard convolution with respect to its input. While a standard convolution with stride greater than 1 reduces spatial dimensions, a transposed convolution increases spatial dimensions. This makes it the natural choice for the decoder or upsampling portion of architectures that need to produce outputs at higher resolution than their intermediate representations.
The operation works by inserting zeros between input elements (for stride > 1) and then applying a standard convolution with the transposed kernel. The kernel weights are learnable parameters, distinguishing transposed convolution from fixed upsampling methods like bilinear interpolation. This allows the network to learn task-specific upsampling behavior.
Key parameters that control the output spatial dimensions include:
- Kernel size -- The spatial extent of the learnable filter
- Stride -- Determines the upsampling factor; a stride of 2 approximately doubles spatial dimensions
- Padding -- Controls how the input boundaries are handled
- Output padding -- Resolves the ambiguity when multiple input sizes map to the same output under standard convolution
Transposed convolutions are widely used in image segmentation (mapping from features back to pixel-level predictions), generative models (producing images from latent vectors), and super-resolution (increasing image resolution).
Usage
Apply transposed convolution when:
- Upsampling feature maps in decoder networks or generative architectures
- The upsampling operation should be learnable rather than fixed
- Building symmetric encoder-decoder architectures where each downsampling convolution has a corresponding upsampling layer
- Producing dense per-pixel outputs from spatially reduced feature representations
Theoretical Basis
Relationship to Standard Convolution
If a standard convolution is represented as matrix multiplication where is the convolution matrix, then the transposed convolution computes:
This is not the inverse of convolution (it does not recover the original input), but rather the transpose of the linear mapping.
Output Size Calculation
For a transposed convolution with input size , kernel size , stride , padding , and output padding :
This is the dual of the standard convolution output size formula:
Checkerboard Artifacts
When the kernel size is not evenly divisible by the stride, transposed convolutions can produce checkerboard artifacts -- regular patterns in the output caused by uneven overlap of the kernel at different output positions. This can be mitigated by choosing kernel sizes that are multiples of the stride, or by using resize-convolution (bilinear upsampling followed by standard convolution) as an alternative.
Multi-Dimensional Extension
Transposed convolution generalizes to 1D (temporal upsampling), 2D (spatial upsampling for images), and 3D (volumetric upsampling for video or medical imaging) in the same manner, with the output size formula applied independently along each dimension.