Implementation:NVIDIA DALI Fn Decoders Image
| Knowledge Sources | |
|---|---|
| Domains | Image_Processing, GPU_Computing, Image_Decoding |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete operator for decoding compressed image data into pixel tensors using GPU-accelerated hardware, provided by the nvidia.dali.fn.decoders module.
Description
fn.decoders.image decodes encoded image byte streams (JPEG, PNG, BMP, TIFF, WebP, etc.) into dense pixel tensors. When configured with device="mixed", the operator parses image headers on the CPU and dispatches the actual pixel decompression to the GPU's NVJPEG hardware decoder, producing output tensors directly in GPU memory.
Key behaviors:
- device="mixed" enables the CPU-parse / GPU-decode split that maximizes throughput for JPEG images.
- output_type=types.RGB produces 3-channel RGB output regardless of the source color space.
- jpeg_fancy_upsampling=True uses a higher-quality interpolation filter for chroma channel upsampling in JPEG images, improving visual quality.
- use_fast_idct=False uses the standard-accuracy inverse DCT implementation for better decode fidelity.
- The output tensor has layout [H, W, C] with uint8 values in the range [0, 255].
Usage
Place fn.decoders.image immediately after the data source operator in the pipeline graph. Set device="mixed" for GPU-accelerated decoding. The operator accepts a DataNode containing encoded image bytes and returns a DataNode containing the decoded pixel tensor on the GPU.
Code Reference
Source Location
- Repository: NVIDIA DALI
- File: docs/examples/zoo/images/decode.py (lines 42-47)
- File: docs/examples/zoo/images/decode_and_transform_pytorch.py (lines 77-83)
Signature
fn.decoders.image(
inputs,
device="mixed",
output_type=types.RGB,
jpeg_fancy_upsampling=True,
use_fast_idct=False,
)
Import
import nvidia.dali.fn as fn
import nvidia.dali.types as types
# or
from nvidia.dali import fn, types
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| inputs | DataNode | Yes | Encoded image bytes as a 1-D uint8 tensor (one per sample in the batch) |
| device | str | No | Device placement: "cpu" (CPU-only decode), "mixed" (CPU parse + GPU decode), or "gpu". Default: "cpu" |
| output_type | types.DALIImageType | No | Desired output color format: types.RGB, types.BGR, types.GRAY, types.ANY_DATA. Default: types.RGB |
| jpeg_fancy_upsampling | bool | No | Use high-quality chroma upsampling for JPEG. Default: False |
| use_fast_idct | bool | No | Use faster but less accurate inverse DCT for JPEG. Default: False |
Outputs
| Name | Type | Description |
|---|---|---|
| decoded | DataNode (GPU) | Decoded image tensor with layout [H, W, 3] and dtype uint8, residing in GPU memory when device="mixed" |
Usage Examples
Example: Basic GPU Decode
import numpy as np
from nvidia.dali.pipeline import pipeline_def
import nvidia.dali.fn as fn
import nvidia.dali.types as types
@pipeline_def(batch_size=4, num_threads=4, device_id=0, exec_dynamic=True)
def decode_pipeline(source_name):
inputs = fn.external_source(
device="cpu",
name=source_name,
no_copy=False,
blocking=True,
dtype=types.UINT8,
)
decoded = fn.decoders.image(
inputs,
device="mixed",
output_type=types.RGB,
jpeg_fancy_upsampling=True,
)
return decoded
pipe = decode_pipeline("encoded_img", prefetch_queue_depth=1)
pipe.build()
Example: Decode with Fast IDCT Disabled
from nvidia.dali import pipeline_def, fn, types
@pipeline_def
def image_pipe(img_hw=(320, 200)):
encoded_images = fn.external_source(name="images", no_copy=True)
decoded = fn.decoders.image(
encoded_images,
device="mixed",
output_type=types.RGB,
use_fast_idct=False,
jpeg_fancy_upsampling=True,
)
images = fn.resize(decoded, size=img_hw, interp_type=types.INTERP_LINEAR)
return images