Implementation:NVIDIA DALI Fn Transpose

Knowledge Sources	NVIDIA DALI
Domains	Video_Processing, GPU_Computing, Tensor_Operations
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete GPU-accelerated tensor axis permutation operator for rearranging tensor dimensions within a DALI pipeline, provided by the NVIDIA DALI library.

Description

fn.transpose is a DALI pipeline operator that permutes the axes of an input tensor according to a specified permutation vector. In the video super-resolution pipeline, it converts video frame sequences from the FHWC (Frames, Height, Width, Channels) layout produced by the video reader and crop operators into the CFHW (Channels, Frames, Height, Width) layout required by PyTorch's convolutional layers.

The perm parameter specifies the axis permutation as a list of integers, where each element at index i indicates which source axis becomes the i-th axis in the output. The permutation [3, 0, 1, 2] maps:

Output axis 0 <- Input axis 3 (C: Channels, 3 for RGB)
Output axis 1 <- Input axis 0 (F: Frames/sequence_length)
Output axis 2 <- Input axis 1 (H: Height)
Output axis 3 <- Input axis 2 (W: Width)

This transforms a tensor of shape [F, crop_h, crop_w, 3] into [3, F, crop_h, crop_w]. When batched by the DALI iterator, the final tensor shape becomes [B, 3, F, crop_h, crop_w] which is the standard BCFHW format for 3D convolutions in PyTorch.

The operation executes entirely on the GPU as part of the DALI pipeline's asynchronous execution, overlapping with other pipeline stages to minimize end-to-end latency.

Usage

Use fn.transpose as the final transformation in a DALI video pipeline, after reading and cropping but before the data is handed to the framework iterator. This is the standard approach for converting DALI's native channel-last output to PyTorch's expected channel-first input.

Code Reference

Source Location

Repository: NVIDIA DALI
File: docs/examples/use_cases/video_superres/dataloading/dataloaders.py (line 27)

Signature

fn.transpose(images, perm=[3, 0, 1, 2])

Import

import nvidia.dali.fn as fn

I/O Contract

Inputs

Name	Type	Required	Description
images	DALI TensorGPU	Yes	Input tensor in FHWC layout with shape [F, H, W, C]
perm	list of int	Yes	Permutation vector specifying the new axis order; [3, 0, 1, 2] for FHWC-to-CFHW

Outputs

Name	Type	Description
transposed_images	DALI TensorGPU	Output tensor in CFHW layout with shape [C, F, H, W]

Usage Examples

FHWC to CFHW Transposition in Video Pipeline

from nvidia.dali.pipeline import pipeline_def
import nvidia.dali.fn as fn
import nvidia.dali.types as types

@pipeline_def
def create_video_reader_pipeline(sequence_length, files, crop_size):
    images = fn.readers.video(
        device="gpu",
        filenames=files,
        sequence_length=sequence_length,
        normalized=False,
        random_shuffle=True,
        image_type=types.RGB,
        dtype=types.UINT8,
        initial_fill=16,
        pad_last_batch=True,
        name="Reader"
    )
    images = fn.crop(
        images,
        crop=crop_size,
        dtype=types.FLOAT,
        crop_pos_x=fn.random.uniform(range=(0.0, 1.0)),
        crop_pos_y=fn.random.uniform(range=(0.0, 1.0))
    )
    # Transpose from FHWC to CFHW for PyTorch conv layers
    images = fn.transpose(images, perm=[3, 0, 1, 2])
    return images

HWC to CHW Transposition for Single Images

# For single images (no frame dimension), use perm=[2, 0, 1]
# to convert HWC -> CHW
images = fn.transpose(images, perm=[2, 0, 1])

Related Pages

Implements Principle

Principle:NVIDIA_DALI_Tensor_Layout_Transposition

Requires Environment

Environment:NVIDIA_DALI_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment