Implementation:NVIDIA DALI Fn Crop

Knowledge Sources	NVIDIA DALI
Domains	Video_Processing, GPU_Computing, Data_Augmentation
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete GPU-accelerated spatial crop operator for extracting fixed-size regions from video frame sequences, provided by the NVIDIA DALI library.

Description

fn.crop is a DALI pipeline operator that extracts a spatial sub-region of specified dimensions from input tensors. When applied to video sequences with shape [F, H, W, C], it crops the same spatial region from every frame in the sequence, producing an output of shape [F, crop_h, crop_w, C]. The crop position is controlled by crop_pos_x and crop_pos_y parameters, which accept normalized coordinates in the range [0.0, 1.0] representing the relative position of the crop window's anchor point within the input spatial dimensions.

In the video super-resolution pipeline, the crop position is randomized by feeding the output of fn.random.uniform(range=(0.0, 1.0)) into both crop_pos_x and crop_pos_y. Because DALI evaluates these random operators once per sample (not once per frame), the same crop position is applied consistently across all frames in a given sequence, maintaining temporal coherence.

The operator simultaneously performs type promotion via the dtype parameter. When dtype=types.FLOAT is specified on a UINT8 input, the pixel values are cast from the [0, 255] integer range to [0.0, 255.0] floating-point range. This conversion is fused with the crop operation, avoiding a separate type-cast kernel launch.

The crop parameter accepts a list of two integers [height, width] specifying the output spatial dimensions. If the requested crop size exceeds the input dimensions, the operation will fail, so the crop size must be chosen to fit within the smallest video resolution in the dataset.

Usage

Use fn.crop immediately after the video reader to extract random spatial patches from decoded video sequences. This operator is the standard approach for spatial augmentation and resolution normalization in DALI video pipelines.

Code Reference

Source Location

Repository: NVIDIA DALI
File: docs/examples/use_cases/video_superres/dataloading/dataloaders.py (lines 23-25)

Signature

fn.crop(
    images,
    crop=crop_size,
    dtype=types.FLOAT,
    crop_pos_x=fn.random.uniform(range=(0.0, 1.0)),
    crop_pos_y=fn.random.uniform(range=(0.0, 1.0))
)

Import

import nvidia.dali.fn as fn
import nvidia.dali.types as types

I/O Contract

Inputs

Name	Type	Required	Description
images	DALI TensorGPU	Yes	Input video tensor with shape [F, H, W, C] in FHWC layout
crop	list of int [height, width]	Yes	Target spatial dimensions of the output crop
dtype	types.DALIDataType	No	Output data type; types.FLOAT casts UINT8 pixels to float32
crop_pos_x	float or DALI DataNode	No	Normalized horizontal crop position in [0.0, 1.0]; 0.0 = left edge, 1.0 = right edge
crop_pos_y	float or DALI DataNode	No	Normalized vertical crop position in [0.0, 1.0]; 0.0 = top edge, 1.0 = bottom edge

Outputs

Name	Type	Description
cropped_images	DALI TensorGPU	Cropped video tensor with shape [F, crop_h, crop_w, C] in FHWC layout, dtype FLOAT

Usage Examples

Random Spatial Crop in Video Pipeline

from nvidia.dali.pipeline import pipeline_def
import nvidia.dali.fn as fn
import nvidia.dali.types as types

@pipeline_def
def create_video_reader_pipeline(sequence_length, files, crop_size):
    images = fn.readers.video(
        device="gpu",
        filenames=files,
        sequence_length=sequence_length,
        normalized=False,
        random_shuffle=True,
        image_type=types.RGB,
        dtype=types.UINT8,
        initial_fill=16,
        pad_last_batch=True,
        name="Reader"
    )
    images = fn.crop(
        images,
        crop=crop_size,
        dtype=types.FLOAT,
        crop_pos_x=fn.random.uniform(range=(0.0, 1.0)),
        crop_pos_y=fn.random.uniform(range=(0.0, 1.0))
    )
    return images

Fixed Center Crop

# Center crop by setting both position parameters to 0.5
images = fn.crop(
    images,
    crop=[256, 256],
    dtype=types.FLOAT,
    crop_pos_x=0.5,
    crop_pos_y=0.5
)

Related Pages

Implements Principle

Principle:NVIDIA_DALI_Video_Spatial_Crop

Requires Environment

Environment:NVIDIA_DALI_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment