Implementation:NVIDIA DALI Fn Crop Mirror Normalize

Knowledge Sources	NVIDIA DALI DALI fn.crop_mirror_normalize
Domains	Data_Pipeline, GPU_Computing, Image_Processing
Last Updated	2026-02-08 00:00 GMT

Overview

The fn.crop_mirror_normalize operator in NVIDIA DALI that performs fused center cropping, optional horizontal mirroring, channel-wise mean/std normalization, layout transposition (HWC to CHW), and type conversion (uint8 to float32) in a single GPU kernel.

Description

fn.crop_mirror_normalize is the final preprocessing operator in a DALI image classification pipeline. It fuses five operations into a single GPU kernel:

Center crop to the specified dimensions via the crop parameter
Horizontal mirror via the mirror parameter (a DataNode from fn.random.coin_flip for training, or False for validation)
Mean subtraction per channel via the mean parameter
Standard deviation division per channel via the std parameter
Layout transposition from HWC to CHW via the output_layout parameter
Type conversion from uint8 to float32 via the dtype parameter

The mean and std values are specified in the [0, 255] scale (e.g., mean=[0.485*255, 0.456*255, 0.406*255]) because the input images are uint8 tensors with pixel values in [0, 255]. The normalization formula applied is: output = (input - mean) / std.

The operator accepts input on GPU (via the .gpu() transfer) and produces output on GPU, keeping the entire pipeline on-device from this point forward.

Usage

Use this operator as the last step in a DALI image preprocessing pipeline to produce the final normalized float32 CHW tensor ready for consumption by a PyTorch model. Ensure the input is transferred to GPU (using .gpu()) before passing it to this operator.

Code Reference

Source Location

Repository: NVIDIA DALI
File: docs/examples/use_cases/pytorch/resnet50/main.py (lines 153-161)
File: docs/examples/use_cases/pytorch/efficientnet/image_classification/dali.py (lines 72-79)

Signature (ResNet50)

images = fn.crop_mirror_normalize(
    images.gpu(),
    dtype=types.FLOAT,
    output_layout="CHW",
    crop=(crop, crop),
    mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
    std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
    mirror=mirror,
)

Signature (EfficientNet)

output = fn.crop_mirror_normalize(
    output,
    dtype=types.FLOAT,
    output_layout=output_layout,
    crop=(image_size, image_size),
    mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
    std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
)

Import

import nvidia.dali.fn as fn
import nvidia.dali.types as types

I/O Contract

Inputs

Name	Type	Required	Description
images	DataNode (GPU)	Yes	Augmented image tensor on GPU; typically HWC uint8 format. Use .gpu() to transfer from CPU/mixed if needed.
dtype	types.DALIDataType	No	Output data type; types.FLOAT for float32 (default). Required for model consumption.
output_layout	str	No	Target tensor layout; "CHW" for PyTorch models, "HWC" for channels-last memory format
crop	tuple(int, int)	No	Center crop dimensions (height, width); e.g., (224, 224)
mean	list[float]	No	Per-channel mean values for normalization, in [0, 255] scale. ImageNet: [0.485255, 0.456255, 0.406*255] = [123.675, 116.28, 103.53]
std	list[float]	No	Per-channel standard deviation values for normalization, in [0, 255] scale. ImageNet: [0.229255, 0.224255, 0.225*255] = [58.395, 57.12, 57.375]
mirror	DataNode/bool	No	Horizontal mirror flag; DataNode from fn.random.coin_flip for training, False for validation

Outputs

Name	Type	Description
images	DataNode (GPU)	Normalized float32 tensor in CHW layout; shape [3, crop_h, crop_w] (e.g., [3, 224, 224]). Values are centered around 0 with approximately unit variance per channel.

Usage Examples

ResNet50 Training Normalization

# After resize and augmentation:
mirror = fn.random.coin_flip(probability=0.5)
images = fn.crop_mirror_normalize(
    images.gpu(),
    dtype=types.FLOAT,
    output_layout="CHW",
    crop=(224, 224),
    mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
    std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
    mirror=mirror,
)

Validation Normalization (No Mirror)

# For validation: deterministic center crop, no mirroring
images = fn.crop_mirror_normalize(
    images.gpu(),
    dtype=types.FLOAT,
    output_layout="CHW",
    crop=(224, 224),
    mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
    std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
    mirror=False,
)

EfficientNet with Channels-Last Output

# When using channels-last memory format for improved GPU performance:
output = fn.crop_mirror_normalize(
    output,
    dtype=types.FLOAT,
    output_layout="HWC",  # channels-last for torch.channels_last
    crop=(image_size, image_size),
    mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
    std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
)

Related Pages

Implements Principle

Principle:NVIDIA_DALI_Image_Normalization

Requires Environment

Environment:NVIDIA_DALI_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment