Implementation:NVIDIA DALI Fn Decoders Image

Knowledge Sources	NVIDIA DALI
Domains	Image_Processing, GPU_Computing, Image_Decoding
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete operator for decoding compressed image data into pixel tensors using GPU-accelerated hardware, provided by the nvidia.dali.fn.decoders module.

Description

fn.decoders.image decodes encoded image byte streams (JPEG, PNG, BMP, TIFF, WebP, etc.) into dense pixel tensors. When configured with device="mixed", the operator parses image headers on the CPU and dispatches the actual pixel decompression to the GPU's NVJPEG hardware decoder, producing output tensors directly in GPU memory.

Key behaviors:

device="mixed" enables the CPU-parse / GPU-decode split that maximizes throughput for JPEG images.
output_type=types.RGB produces 3-channel RGB output regardless of the source color space.
jpeg_fancy_upsampling=True uses a higher-quality interpolation filter for chroma channel upsampling in JPEG images, improving visual quality.
use_fast_idct=False uses the standard-accuracy inverse DCT implementation for better decode fidelity.
The output tensor has layout [H, W, C] with uint8 values in the range [0, 255].

Usage

Place fn.decoders.image immediately after the data source operator in the pipeline graph. Set device="mixed" for GPU-accelerated decoding. The operator accepts a DataNode containing encoded image bytes and returns a DataNode containing the decoded pixel tensor on the GPU.

Code Reference

Source Location

Repository: NVIDIA DALI
File: docs/examples/zoo/images/decode.py (lines 42-47)
File: docs/examples/zoo/images/decode_and_transform_pytorch.py (lines 77-83)

Signature

fn.decoders.image(
    inputs,
    device="mixed",
    output_type=types.RGB,
    jpeg_fancy_upsampling=True,
    use_fast_idct=False,
)

Import

import nvidia.dali.fn as fn
import nvidia.dali.types as types
# or
from nvidia.dali import fn, types

I/O Contract

Inputs

Name	Type	Required	Description
inputs	DataNode	Yes	Encoded image bytes as a 1-D uint8 tensor (one per sample in the batch)
device	str	No	Device placement: "cpu" (CPU-only decode), "mixed" (CPU parse + GPU decode), or "gpu". Default: "cpu"
output_type	types.DALIImageType	No	Desired output color format: types.RGB, types.BGR, types.GRAY, types.ANY_DATA. Default: types.RGB
jpeg_fancy_upsampling	bool	No	Use high-quality chroma upsampling for JPEG. Default: False
use_fast_idct	bool	No	Use faster but less accurate inverse DCT for JPEG. Default: False

Outputs

Name	Type	Description
decoded	DataNode (GPU)	Decoded image tensor with layout [H, W, 3] and dtype uint8, residing in GPU memory when device="mixed"

Usage Examples

Example: Basic GPU Decode

import numpy as np
from nvidia.dali.pipeline import pipeline_def
import nvidia.dali.fn as fn
import nvidia.dali.types as types

@pipeline_def(batch_size=4, num_threads=4, device_id=0, exec_dynamic=True)
def decode_pipeline(source_name):
    inputs = fn.external_source(
        device="cpu",
        name=source_name,
        no_copy=False,
        blocking=True,
        dtype=types.UINT8,
    )
    decoded = fn.decoders.image(
        inputs,
        device="mixed",
        output_type=types.RGB,
        jpeg_fancy_upsampling=True,
    )
    return decoded

pipe = decode_pipeline("encoded_img", prefetch_queue_depth=1)
pipe.build()

Example: Decode with Fast IDCT Disabled

from nvidia.dali import pipeline_def, fn, types

@pipeline_def
def image_pipe(img_hw=(320, 200)):
    encoded_images = fn.external_source(name="images", no_copy=True)
    decoded = fn.decoders.image(
        encoded_images,
        device="mixed",
        output_type=types.RGB,
        use_fast_idct=False,
        jpeg_fancy_upsampling=True,
    )
    images = fn.resize(decoded, size=img_hw, interp_type=types.INTERP_LINEAR)
    return images

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment