Implementation:NVIDIA DALI Fn Transpose
| Knowledge Sources | |
|---|---|
| Domains | Video_Processing, GPU_Computing, Tensor_Operations |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete GPU-accelerated tensor axis permutation operator for rearranging tensor dimensions within a DALI pipeline, provided by the NVIDIA DALI library.
Description
fn.transpose is a DALI pipeline operator that permutes the axes of an input tensor according to a specified permutation vector. In the video super-resolution pipeline, it converts video frame sequences from the FHWC (Frames, Height, Width, Channels) layout produced by the video reader and crop operators into the CFHW (Channels, Frames, Height, Width) layout required by PyTorch's convolutional layers.
The perm parameter specifies the axis permutation as a list of integers, where each element at index i indicates which source axis becomes the i-th axis in the output. The permutation [3, 0, 1, 2] maps:
- Output axis 0 <- Input axis 3 (C: Channels, 3 for RGB)
- Output axis 1 <- Input axis 0 (F: Frames/sequence_length)
- Output axis 2 <- Input axis 1 (H: Height)
- Output axis 3 <- Input axis 2 (W: Width)
This transforms a tensor of shape [F, crop_h, crop_w, 3] into [3, F, crop_h, crop_w]. When batched by the DALI iterator, the final tensor shape becomes [B, 3, F, crop_h, crop_w] which is the standard BCFHW format for 3D convolutions in PyTorch.
The operation executes entirely on the GPU as part of the DALI pipeline's asynchronous execution, overlapping with other pipeline stages to minimize end-to-end latency.
Usage
Use fn.transpose as the final transformation in a DALI video pipeline, after reading and cropping but before the data is handed to the framework iterator. This is the standard approach for converting DALI's native channel-last output to PyTorch's expected channel-first input.
Code Reference
Source Location
- Repository: NVIDIA DALI
- File: docs/examples/use_cases/video_superres/dataloading/dataloaders.py (line 27)
Signature
fn.transpose(images, perm=[3, 0, 1, 2])
Import
import nvidia.dali.fn as fn
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| images | DALI TensorGPU | Yes | Input tensor in FHWC layout with shape [F, H, W, C] |
| perm | list of int | Yes | Permutation vector specifying the new axis order; [3, 0, 1, 2] for FHWC-to-CFHW |
Outputs
| Name | Type | Description |
|---|---|---|
| transposed_images | DALI TensorGPU | Output tensor in CFHW layout with shape [C, F, H, W] |
Usage Examples
FHWC to CFHW Transposition in Video Pipeline
from nvidia.dali.pipeline import pipeline_def
import nvidia.dali.fn as fn
import nvidia.dali.types as types
@pipeline_def
def create_video_reader_pipeline(sequence_length, files, crop_size):
images = fn.readers.video(
device="gpu",
filenames=files,
sequence_length=sequence_length,
normalized=False,
random_shuffle=True,
image_type=types.RGB,
dtype=types.UINT8,
initial_fill=16,
pad_last_batch=True,
name="Reader"
)
images = fn.crop(
images,
crop=crop_size,
dtype=types.FLOAT,
crop_pos_x=fn.random.uniform(range=(0.0, 1.0)),
crop_pos_y=fn.random.uniform(range=(0.0, 1.0))
)
# Transpose from FHWC to CFHW for PyTorch conv layers
images = fn.transpose(images, perm=[3, 0, 1, 2])
return images
HWC to CHW Transposition for Single Images
# For single images (no frame dimension), use perm=[2, 0, 1]
# to convert HWC -> CHW
images = fn.transpose(images, perm=[2, 0, 1])