Implementation:NVIDIA DALI Fn Crop Mirror Normalize
| Knowledge Sources | |
|---|---|
| Domains | Data_Pipeline, GPU_Computing, Image_Processing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
The fn.crop_mirror_normalize operator in NVIDIA DALI that performs fused center cropping, optional horizontal mirroring, channel-wise mean/std normalization, layout transposition (HWC to CHW), and type conversion (uint8 to float32) in a single GPU kernel.
Description
fn.crop_mirror_normalize is the final preprocessing operator in a DALI image classification pipeline. It fuses five operations into a single GPU kernel:
- Center crop to the specified dimensions via the crop parameter
- Horizontal mirror via the mirror parameter (a DataNode from fn.random.coin_flip for training, or False for validation)
- Mean subtraction per channel via the mean parameter
- Standard deviation division per channel via the std parameter
- Layout transposition from HWC to CHW via the output_layout parameter
- Type conversion from uint8 to float32 via the dtype parameter
The mean and std values are specified in the [0, 255] scale (e.g., mean=[0.485*255, 0.456*255, 0.406*255]) because the input images are uint8 tensors with pixel values in [0, 255]. The normalization formula applied is: output = (input - mean) / std.
The operator accepts input on GPU (via the .gpu() transfer) and produces output on GPU, keeping the entire pipeline on-device from this point forward.
Usage
Use this operator as the last step in a DALI image preprocessing pipeline to produce the final normalized float32 CHW tensor ready for consumption by a PyTorch model. Ensure the input is transferred to GPU (using .gpu()) before passing it to this operator.
Code Reference
Source Location
- Repository: NVIDIA DALI
- File: docs/examples/use_cases/pytorch/resnet50/main.py (lines 153-161)
- File: docs/examples/use_cases/pytorch/efficientnet/image_classification/dali.py (lines 72-79)
Signature (ResNet50)
images = fn.crop_mirror_normalize(
images.gpu(),
dtype=types.FLOAT,
output_layout="CHW",
crop=(crop, crop),
mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
mirror=mirror,
)
Signature (EfficientNet)
output = fn.crop_mirror_normalize(
output,
dtype=types.FLOAT,
output_layout=output_layout,
crop=(image_size, image_size),
mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
)
Import
import nvidia.dali.fn as fn
import nvidia.dali.types as types
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| images | DataNode (GPU) | Yes | Augmented image tensor on GPU; typically HWC uint8 format. Use .gpu() to transfer from CPU/mixed if needed. |
| dtype | types.DALIDataType | No | Output data type; types.FLOAT for float32 (default). Required for model consumption. |
| output_layout | str | No | Target tensor layout; "CHW" for PyTorch models, "HWC" for channels-last memory format |
| crop | tuple(int, int) | No | Center crop dimensions (height, width); e.g., (224, 224) |
| mean | list[float] | No | Per-channel mean values for normalization, in [0, 255] scale. ImageNet: [0.485*255, 0.456*255, 0.406*255] = [123.675, 116.28, 103.53] |
| std | list[float] | No | Per-channel standard deviation values for normalization, in [0, 255] scale. ImageNet: [0.229*255, 0.224*255, 0.225*255] = [58.395, 57.12, 57.375] |
| mirror | DataNode/bool | No | Horizontal mirror flag; DataNode from fn.random.coin_flip for training, False for validation |
Outputs
| Name | Type | Description |
|---|---|---|
| images | DataNode (GPU) | Normalized float32 tensor in CHW layout; shape [3, crop_h, crop_w] (e.g., [3, 224, 224]). Values are centered around 0 with approximately unit variance per channel. |
Usage Examples
ResNet50 Training Normalization
# After resize and augmentation:
mirror = fn.random.coin_flip(probability=0.5)
images = fn.crop_mirror_normalize(
images.gpu(),
dtype=types.FLOAT,
output_layout="CHW",
crop=(224, 224),
mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
mirror=mirror,
)
Validation Normalization (No Mirror)
# For validation: deterministic center crop, no mirroring
images = fn.crop_mirror_normalize(
images.gpu(),
dtype=types.FLOAT,
output_layout="CHW",
crop=(224, 224),
mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
mirror=False,
)
EfficientNet with Channels-Last Output
# When using channels-last memory format for improved GPU performance:
output = fn.crop_mirror_normalize(
output,
dtype=types.FLOAT,
output_layout="HWC", # channels-last for torch.channels_last
crop=(image_size, image_size),
mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
)