Implementation:NVIDIA DALI Fn Crop
| Knowledge Sources | |
|---|---|
| Domains | Video_Processing, GPU_Computing, Data_Augmentation |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete GPU-accelerated spatial crop operator for extracting fixed-size regions from video frame sequences, provided by the NVIDIA DALI library.
Description
fn.crop is a DALI pipeline operator that extracts a spatial sub-region of specified dimensions from input tensors. When applied to video sequences with shape [F, H, W, C], it crops the same spatial region from every frame in the sequence, producing an output of shape [F, crop_h, crop_w, C]. The crop position is controlled by crop_pos_x and crop_pos_y parameters, which accept normalized coordinates in the range [0.0, 1.0] representing the relative position of the crop window's anchor point within the input spatial dimensions.
In the video super-resolution pipeline, the crop position is randomized by feeding the output of fn.random.uniform(range=(0.0, 1.0)) into both crop_pos_x and crop_pos_y. Because DALI evaluates these random operators once per sample (not once per frame), the same crop position is applied consistently across all frames in a given sequence, maintaining temporal coherence.
The operator simultaneously performs type promotion via the dtype parameter. When dtype=types.FLOAT is specified on a UINT8 input, the pixel values are cast from the [0, 255] integer range to [0.0, 255.0] floating-point range. This conversion is fused with the crop operation, avoiding a separate type-cast kernel launch.
The crop parameter accepts a list of two integers [height, width] specifying the output spatial dimensions. If the requested crop size exceeds the input dimensions, the operation will fail, so the crop size must be chosen to fit within the smallest video resolution in the dataset.
Usage
Use fn.crop immediately after the video reader to extract random spatial patches from decoded video sequences. This operator is the standard approach for spatial augmentation and resolution normalization in DALI video pipelines.
Code Reference
Source Location
- Repository: NVIDIA DALI
- File: docs/examples/use_cases/video_superres/dataloading/dataloaders.py (lines 23-25)
Signature
fn.crop(
images,
crop=crop_size,
dtype=types.FLOAT,
crop_pos_x=fn.random.uniform(range=(0.0, 1.0)),
crop_pos_y=fn.random.uniform(range=(0.0, 1.0))
)
Import
import nvidia.dali.fn as fn
import nvidia.dali.types as types
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| images | DALI TensorGPU | Yes | Input video tensor with shape [F, H, W, C] in FHWC layout |
| crop | list of int [height, width] | Yes | Target spatial dimensions of the output crop |
| dtype | types.DALIDataType | No | Output data type; types.FLOAT casts UINT8 pixels to float32 |
| crop_pos_x | float or DALI DataNode | No | Normalized horizontal crop position in [0.0, 1.0]; 0.0 = left edge, 1.0 = right edge |
| crop_pos_y | float or DALI DataNode | No | Normalized vertical crop position in [0.0, 1.0]; 0.0 = top edge, 1.0 = bottom edge |
Outputs
| Name | Type | Description |
|---|---|---|
| cropped_images | DALI TensorGPU | Cropped video tensor with shape [F, crop_h, crop_w, C] in FHWC layout, dtype FLOAT |
Usage Examples
Random Spatial Crop in Video Pipeline
from nvidia.dali.pipeline import pipeline_def
import nvidia.dali.fn as fn
import nvidia.dali.types as types
@pipeline_def
def create_video_reader_pipeline(sequence_length, files, crop_size):
images = fn.readers.video(
device="gpu",
filenames=files,
sequence_length=sequence_length,
normalized=False,
random_shuffle=True,
image_type=types.RGB,
dtype=types.UINT8,
initial_fill=16,
pad_last_batch=True,
name="Reader"
)
images = fn.crop(
images,
crop=crop_size,
dtype=types.FLOAT,
crop_pos_x=fn.random.uniform(range=(0.0, 1.0)),
crop_pos_y=fn.random.uniform(range=(0.0, 1.0))
)
return images
Fixed Center Crop
# Center crop by setting both position parameters to 0.5
images = fn.crop(
images,
crop=[256, 256],
dtype=types.FLOAT,
crop_pos_x=0.5,
crop_pos_y=0.5
)