Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:NVIDIA DALI Fn Crop Mirror Normalize

From Leeroopedia


Knowledge Sources
Domains Data_Pipeline, GPU_Computing, Image_Processing
Last Updated 2026-02-08 00:00 GMT

Overview

The fn.crop_mirror_normalize operator in NVIDIA DALI that performs fused center cropping, optional horizontal mirroring, channel-wise mean/std normalization, layout transposition (HWC to CHW), and type conversion (uint8 to float32) in a single GPU kernel.

Description

fn.crop_mirror_normalize is the final preprocessing operator in a DALI image classification pipeline. It fuses five operations into a single GPU kernel:

  1. Center crop to the specified dimensions via the crop parameter
  2. Horizontal mirror via the mirror parameter (a DataNode from fn.random.coin_flip for training, or False for validation)
  3. Mean subtraction per channel via the mean parameter
  4. Standard deviation division per channel via the std parameter
  5. Layout transposition from HWC to CHW via the output_layout parameter
  6. Type conversion from uint8 to float32 via the dtype parameter

The mean and std values are specified in the [0, 255] scale (e.g., mean=[0.485*255, 0.456*255, 0.406*255]) because the input images are uint8 tensors with pixel values in [0, 255]. The normalization formula applied is: output = (input - mean) / std.

The operator accepts input on GPU (via the .gpu() transfer) and produces output on GPU, keeping the entire pipeline on-device from this point forward.

Usage

Use this operator as the last step in a DALI image preprocessing pipeline to produce the final normalized float32 CHW tensor ready for consumption by a PyTorch model. Ensure the input is transferred to GPU (using .gpu()) before passing it to this operator.

Code Reference

Source Location

  • Repository: NVIDIA DALI
  • File: docs/examples/use_cases/pytorch/resnet50/main.py (lines 153-161)
  • File: docs/examples/use_cases/pytorch/efficientnet/image_classification/dali.py (lines 72-79)

Signature (ResNet50)

images = fn.crop_mirror_normalize(
    images.gpu(),
    dtype=types.FLOAT,
    output_layout="CHW",
    crop=(crop, crop),
    mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
    std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
    mirror=mirror,
)

Signature (EfficientNet)

output = fn.crop_mirror_normalize(
    output,
    dtype=types.FLOAT,
    output_layout=output_layout,
    crop=(image_size, image_size),
    mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
    std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
)

Import

import nvidia.dali.fn as fn
import nvidia.dali.types as types

I/O Contract

Inputs

Name Type Required Description
images DataNode (GPU) Yes Augmented image tensor on GPU; typically HWC uint8 format. Use .gpu() to transfer from CPU/mixed if needed.
dtype types.DALIDataType No Output data type; types.FLOAT for float32 (default). Required for model consumption.
output_layout str No Target tensor layout; "CHW" for PyTorch models, "HWC" for channels-last memory format
crop tuple(int, int) No Center crop dimensions (height, width); e.g., (224, 224)
mean list[float] No Per-channel mean values for normalization, in [0, 255] scale. ImageNet: [0.485*255, 0.456*255, 0.406*255] = [123.675, 116.28, 103.53]
std list[float] No Per-channel standard deviation values for normalization, in [0, 255] scale. ImageNet: [0.229*255, 0.224*255, 0.225*255] = [58.395, 57.12, 57.375]
mirror DataNode/bool No Horizontal mirror flag; DataNode from fn.random.coin_flip for training, False for validation

Outputs

Name Type Description
images DataNode (GPU) Normalized float32 tensor in CHW layout; shape [3, crop_h, crop_w] (e.g., [3, 224, 224]). Values are centered around 0 with approximately unit variance per channel.

Usage Examples

ResNet50 Training Normalization

# After resize and augmentation:
mirror = fn.random.coin_flip(probability=0.5)
images = fn.crop_mirror_normalize(
    images.gpu(),
    dtype=types.FLOAT,
    output_layout="CHW",
    crop=(224, 224),
    mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
    std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
    mirror=mirror,
)

Validation Normalization (No Mirror)

# For validation: deterministic center crop, no mirroring
images = fn.crop_mirror_normalize(
    images.gpu(),
    dtype=types.FLOAT,
    output_layout="CHW",
    crop=(224, 224),
    mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
    std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
    mirror=False,
)

EfficientNet with Channels-Last Output

# When using channels-last memory format for improved GPU performance:
output = fn.crop_mirror_normalize(
    output,
    dtype=types.FLOAT,
    output_layout="HWC",  # channels-last for torch.channels_last
    crop=(image_size, image_size),
    mean=[0.485 * 255, 0.456 * 255, 0.406 * 255],
    std=[0.229 * 255, 0.224 * 255, 0.225 * 255],
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment