Principle:NVIDIA DALI GridMask Augmentation

Knowledge Sources	NVIDIA DALI Documentation
Domains	Object_Detection, GPU_Computing
Last Updated	2026-02-08 00:00 GMT

Overview

GridMask augmentation is a structured information-dropping technique that overlays a regular grid of masked (zeroed) regions onto an image to improve model generalization.

Description

GridMask Augmentation is a regularization method introduced to address the limitations of random erasing and cutout techniques. Instead of masking a single contiguous rectangular region, GridMask removes information in a spatially structured pattern: a grid of evenly spaced square holes rotated by a random angle. This forces the model to rely on a wider distribution of spatial features rather than memorizing specific local patterns.

The key parameters of GridMask are:

Ratio: Controls the fraction of each grid cell that is masked. A ratio of 0 means no masking; a ratio approaching 1 means nearly the entire cell is masked. In practice, a moderate ratio (e.g., 0.4) is used.
Tile size: The size of each grid cell in pixels. Larger tiles create coarser masks, while smaller tiles create finer ones. The tile size is typically randomized between bounds that depend on the image dimensions.
Angle: A random rotation angle applied to the grid pattern, preventing the mask from aligning with the regular structure of the image (e.g., edges of buildings, grid-like textures).

GridMask is applied stochastically: a coin flip determines whether each sample in the batch receives the augmentation. This probabilistic application prevents the model from adapting to the fixed presence of the grid pattern.

Unlike spatial augmentations such as cropping or flipping, GridMask does not modify bounding box annotations because it only changes pixel values, not the spatial layout of objects.

Usage

Use this principle when training object detection or image classification models that overfit on the spatial structure of training images. GridMask is particularly effective when combined with other augmentations and is applied before normalization in the pipeline.

Theoretical Basis

The GridMask pattern is defined by a binary mask M(x, y) over image coordinates:

M(x, y) = 1  if (x mod tile) < tile * ratio  OR  (y mod tile) < tile * ratio
M(x, y) = 0  otherwise

The mask is then rotated by angle theta:

x' = x * cos(theta) - y * sin(theta)
y' = x * sin(theta) + y * cos(theta)
M_rotated(x, y) = M(x', y')

The augmented image is:

image' = image * (1 - M_rotated)

The tile size is sampled uniformly from a range that scales with image dimensions. In the EfficientDet implementation, the bounds are computed as:

lower = min(0.5 * height, 0.3 * width)
upper = max(0.5 * height, 0.3 * width)
tile = uniform(lower, upper)

The angle is sampled from a normal distribution centered at -1 with standard deviation 1, scaled by 10 degrees (converted to radians).

Related Pages

Implemented By

Implementation:NVIDIA_DALI_Ops_Util_Gridmask

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment