Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:NVIDIA DALI Fn Transpose

From Leeroopedia


Knowledge Sources
Domains Video_Processing, GPU_Computing, Tensor_Operations
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete GPU-accelerated tensor axis permutation operator for rearranging tensor dimensions within a DALI pipeline, provided by the NVIDIA DALI library.

Description

fn.transpose is a DALI pipeline operator that permutes the axes of an input tensor according to a specified permutation vector. In the video super-resolution pipeline, it converts video frame sequences from the FHWC (Frames, Height, Width, Channels) layout produced by the video reader and crop operators into the CFHW (Channels, Frames, Height, Width) layout required by PyTorch's convolutional layers.

The perm parameter specifies the axis permutation as a list of integers, where each element at index i indicates which source axis becomes the i-th axis in the output. The permutation [3, 0, 1, 2] maps:

  • Output axis 0 <- Input axis 3 (C: Channels, 3 for RGB)
  • Output axis 1 <- Input axis 0 (F: Frames/sequence_length)
  • Output axis 2 <- Input axis 1 (H: Height)
  • Output axis 3 <- Input axis 2 (W: Width)

This transforms a tensor of shape [F, crop_h, crop_w, 3] into [3, F, crop_h, crop_w]. When batched by the DALI iterator, the final tensor shape becomes [B, 3, F, crop_h, crop_w] which is the standard BCFHW format for 3D convolutions in PyTorch.

The operation executes entirely on the GPU as part of the DALI pipeline's asynchronous execution, overlapping with other pipeline stages to minimize end-to-end latency.

Usage

Use fn.transpose as the final transformation in a DALI video pipeline, after reading and cropping but before the data is handed to the framework iterator. This is the standard approach for converting DALI's native channel-last output to PyTorch's expected channel-first input.

Code Reference

Source Location

  • Repository: NVIDIA DALI
  • File: docs/examples/use_cases/video_superres/dataloading/dataloaders.py (line 27)

Signature

fn.transpose(images, perm=[3, 0, 1, 2])

Import

import nvidia.dali.fn as fn

I/O Contract

Inputs

Name Type Required Description
images DALI TensorGPU Yes Input tensor in FHWC layout with shape [F, H, W, C]
perm list of int Yes Permutation vector specifying the new axis order; [3, 0, 1, 2] for FHWC-to-CFHW

Outputs

Name Type Description
transposed_images DALI TensorGPU Output tensor in CFHW layout with shape [C, F, H, W]

Usage Examples

FHWC to CFHW Transposition in Video Pipeline

from nvidia.dali.pipeline import pipeline_def
import nvidia.dali.fn as fn
import nvidia.dali.types as types

@pipeline_def
def create_video_reader_pipeline(sequence_length, files, crop_size):
    images = fn.readers.video(
        device="gpu",
        filenames=files,
        sequence_length=sequence_length,
        normalized=False,
        random_shuffle=True,
        image_type=types.RGB,
        dtype=types.UINT8,
        initial_fill=16,
        pad_last_batch=True,
        name="Reader"
    )
    images = fn.crop(
        images,
        crop=crop_size,
        dtype=types.FLOAT,
        crop_pos_x=fn.random.uniform(range=(0.0, 1.0)),
        crop_pos_y=fn.random.uniform(range=(0.0, 1.0))
    )
    # Transpose from FHWC to CFHW for PyTorch conv layers
    images = fn.transpose(images, perm=[3, 0, 1, 2])
    return images

HWC to CHW Transposition for Single Images

# For single images (no frame dimension), use perm=[2, 0, 1]
# to convert HWC -> CHW
images = fn.transpose(images, perm=[2, 0, 1])

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment