Principle:NVIDIA DALI GridMask Augmentation
| Knowledge Sources | |
|---|---|
| Domains | Object_Detection, GPU_Computing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
GridMask augmentation is a structured information-dropping technique that overlays a regular grid of masked (zeroed) regions onto an image to improve model generalization.
Description
GridMask Augmentation is a regularization method introduced to address the limitations of random erasing and cutout techniques. Instead of masking a single contiguous rectangular region, GridMask removes information in a spatially structured pattern: a grid of evenly spaced square holes rotated by a random angle. This forces the model to rely on a wider distribution of spatial features rather than memorizing specific local patterns.
The key parameters of GridMask are:
- Ratio: Controls the fraction of each grid cell that is masked. A ratio of 0 means no masking; a ratio approaching 1 means nearly the entire cell is masked. In practice, a moderate ratio (e.g., 0.4) is used.
- Tile size: The size of each grid cell in pixels. Larger tiles create coarser masks, while smaller tiles create finer ones. The tile size is typically randomized between bounds that depend on the image dimensions.
- Angle: A random rotation angle applied to the grid pattern, preventing the mask from aligning with the regular structure of the image (e.g., edges of buildings, grid-like textures).
GridMask is applied stochastically: a coin flip determines whether each sample in the batch receives the augmentation. This probabilistic application prevents the model from adapting to the fixed presence of the grid pattern.
Unlike spatial augmentations such as cropping or flipping, GridMask does not modify bounding box annotations because it only changes pixel values, not the spatial layout of objects.
Usage
Use this principle when training object detection or image classification models that overfit on the spatial structure of training images. GridMask is particularly effective when combined with other augmentations and is applied before normalization in the pipeline.
Theoretical Basis
The GridMask pattern is defined by a binary mask M(x, y) over image coordinates:
M(x, y) = 1 if (x mod tile) < tile * ratio OR (y mod tile) < tile * ratio
M(x, y) = 0 otherwise
The mask is then rotated by angle theta:
x' = x * cos(theta) - y * sin(theta)
y' = x * sin(theta) + y * cos(theta)
M_rotated(x, y) = M(x', y')
The augmented image is:
image' = image * (1 - M_rotated)
The tile size is sampled uniformly from a range that scales with image dimensions. In the EfficientDet implementation, the bounds are computed as:
lower = min(0.5 * height, 0.3 * width)
upper = max(0.5 * height, 0.3 * width)
tile = uniform(lower, upper)
The angle is sampled from a normal distribution centered at -1 with standard deviation 1, scaled by 10 degrees (converted to radians).