Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:NVIDIA DALI Fn Crop

From Leeroopedia


Knowledge Sources
Domains Video_Processing, GPU_Computing, Data_Augmentation
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete GPU-accelerated spatial crop operator for extracting fixed-size regions from video frame sequences, provided by the NVIDIA DALI library.

Description

fn.crop is a DALI pipeline operator that extracts a spatial sub-region of specified dimensions from input tensors. When applied to video sequences with shape [F, H, W, C], it crops the same spatial region from every frame in the sequence, producing an output of shape [F, crop_h, crop_w, C]. The crop position is controlled by crop_pos_x and crop_pos_y parameters, which accept normalized coordinates in the range [0.0, 1.0] representing the relative position of the crop window's anchor point within the input spatial dimensions.

In the video super-resolution pipeline, the crop position is randomized by feeding the output of fn.random.uniform(range=(0.0, 1.0)) into both crop_pos_x and crop_pos_y. Because DALI evaluates these random operators once per sample (not once per frame), the same crop position is applied consistently across all frames in a given sequence, maintaining temporal coherence.

The operator simultaneously performs type promotion via the dtype parameter. When dtype=types.FLOAT is specified on a UINT8 input, the pixel values are cast from the [0, 255] integer range to [0.0, 255.0] floating-point range. This conversion is fused with the crop operation, avoiding a separate type-cast kernel launch.

The crop parameter accepts a list of two integers [height, width] specifying the output spatial dimensions. If the requested crop size exceeds the input dimensions, the operation will fail, so the crop size must be chosen to fit within the smallest video resolution in the dataset.

Usage

Use fn.crop immediately after the video reader to extract random spatial patches from decoded video sequences. This operator is the standard approach for spatial augmentation and resolution normalization in DALI video pipelines.

Code Reference

Source Location

  • Repository: NVIDIA DALI
  • File: docs/examples/use_cases/video_superres/dataloading/dataloaders.py (lines 23-25)

Signature

fn.crop(
    images,
    crop=crop_size,
    dtype=types.FLOAT,
    crop_pos_x=fn.random.uniform(range=(0.0, 1.0)),
    crop_pos_y=fn.random.uniform(range=(0.0, 1.0))
)

Import

import nvidia.dali.fn as fn
import nvidia.dali.types as types

I/O Contract

Inputs

Name Type Required Description
images DALI TensorGPU Yes Input video tensor with shape [F, H, W, C] in FHWC layout
crop list of int [height, width] Yes Target spatial dimensions of the output crop
dtype types.DALIDataType No Output data type; types.FLOAT casts UINT8 pixels to float32
crop_pos_x float or DALI DataNode No Normalized horizontal crop position in [0.0, 1.0]; 0.0 = left edge, 1.0 = right edge
crop_pos_y float or DALI DataNode No Normalized vertical crop position in [0.0, 1.0]; 0.0 = top edge, 1.0 = bottom edge

Outputs

Name Type Description
cropped_images DALI TensorGPU Cropped video tensor with shape [F, crop_h, crop_w, C] in FHWC layout, dtype FLOAT

Usage Examples

Random Spatial Crop in Video Pipeline

from nvidia.dali.pipeline import pipeline_def
import nvidia.dali.fn as fn
import nvidia.dali.types as types

@pipeline_def
def create_video_reader_pipeline(sequence_length, files, crop_size):
    images = fn.readers.video(
        device="gpu",
        filenames=files,
        sequence_length=sequence_length,
        normalized=False,
        random_shuffle=True,
        image_type=types.RGB,
        dtype=types.UINT8,
        initial_fill=16,
        pad_last_batch=True,
        name="Reader"
    )
    images = fn.crop(
        images,
        crop=crop_size,
        dtype=types.FLOAT,
        crop_pos_x=fn.random.uniform(range=(0.0, 1.0)),
        crop_pos_y=fn.random.uniform(range=(0.0, 1.0))
    )
    return images

Fixed Center Crop

# Center crop by setting both position parameters to 0.5
images = fn.crop(
    images,
    crop=[256, 256],
    dtype=types.FLOAT,
    crop_pos_x=0.5,
    crop_pos_y=0.5
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment