Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Kornia Kornia Augmentation Transforms

From Leeroopedia


Knowledge Sources
  • Paper: "Data Augmentation for Low Resource Neural Machine Translation" and general augmentation literature
  • Kornia
Domains Vision, Augmentation, Deep_Learning
Last Updated 2026-02-09 15:00 GMT

Overview

Technique of applying random geometric and photometric transformations to training images to improve model generalization and robustness.

Description

Data augmentation artificially expands training datasets by applying stochastic transforms. These transforms fall into two broad categories:

  • Geometric augmentations (affine, perspective, rotation) modify the spatial layout of the image.
  • Photometric augmentations (brightness, contrast, color jiggle) modify pixel intensities without altering spatial structure.

Each transform is applied with probability p, enabling fine-grained control over augmentation intensity. In differentiable augmentation, gradients flow through the transforms, enabling end-to-end optimization of augmentation parameters or downstream models.

The augmentation module uses a base class hierarchy:

  • _BasicAugmentationBase provides probability control and parameter generation.
  • _AugmentationBase adds transformation matrix tracking for invertibility, allowing augmented data to be mapped back to the original coordinate space.

Usage

Use when training vision models to reduce overfitting and increase robustness to input variations. Essential for small datasets and transfer learning scenarios where the available training data does not cover the full distribution of expected inputs.

Theoretical Basis

Augmentation acts as an implicit regularizer by presenting different views of the same training example. Each transform T_i is applied with probability p_i:

# Probabilistic transform application
if Bernoulli(p_i) == 1:
    x_prime = T_i(x)
else:
    x_prime = x

Formally: x' = T_i(x) if Bernoulli(p_i) = 1, else x' = x.

Geometric transforms are represented as affine matrices, enabling composition and inversion:

# Affine transform composition
M_composed = M_n @ ... @ M_2 @ M_1
x_original = M_composed.inverse() @ x_augmented

This matrix representation allows the pipeline to track exactly what spatial transformation was applied and to reverse it when needed (e.g., mapping predicted coordinates back to the original image space).

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment