Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA DALI NumPy Parser

From Leeroopedia


Knowledge Sources
Domains Utilities, File_IO
Last Updated 2026-02-08 16:00 GMT

Overview

Implements NumPy .npy file parsing, header extraction, and tensor reading for loading NumPy arrays into DALI CPU tensors.

Description

The NumPy parser implementation in dali/util/numpy.cc provides routines for reading the NumPy binary file format (NPY version 1). The parser interprets the NPY header to extract the data type descriptor, memory layout (C-order vs Fortran-order), and tensor shape. It supports a comprehensive set of numeric types: boolean, unsigned and signed integers (8 to 64 bit), float16, float32, and float64. Only little-endian and native byte orders are supported.

The header parsing is performed by ParseHeaderContents, which takes a Python dictionary string from the NPY header and extracts the descr, fortran_order, and shape fields using a hand-written recursive descent parser. Helper functions such as SkipSpaces, Skip, TrySkip, ParseInteger, and ParseStringValue handle the tokenization of the header string. Two entry points for header parsing are provided: ParseHeader for standard file streams and ParseODirectHeader for O_DIRECT aligned file access, which performs aligned reads suitable for direct I/O.

The module also provides ReadTensor, a convenience function that parses the header, reads the raw data into a Tensor<CPUBackend>, and optionally transposes Fortran-order arrays to C-order using the FromFortranOrder function, which delegates to the DALI transpose kernel.

Usage

Use the NumPy parser when loading .npy files into the DALI pipeline. Call ParseHeader to extract metadata from an input stream, then read raw tensor data at the specified data_offset. For a single-call approach, use ReadTensor which combines header parsing, data reading, and optional Fortran-order transposition into a single operation.

Code Reference

Source Location

Signature

const TypeInfo &TypeFromNumpyStr(const std::string &format);

void ParseHeaderContents(HeaderData& target, const std::string_view header);

void ParseHeader(HeaderData &parsed_header, InputStream *src);

void ParseODirectHeader(HeaderData &parsed_header, InputStream *src,
                        size_t o_direct_alignm, size_t o_direct_read_len_alignm);

void FromFortranOrder(SampleView<CPUBackend> output, ConstSampleView<CPUBackend> input);

Tensor<CPUBackend> ReadTensor(InputStream *src, bool pinned);

DALIDataType HeaderData::type() const;
size_t HeaderData::size() const;
size_t HeaderData::nbytes() const;

Import

#include "dali/util/numpy.h"

I/O Contract

Inputs

Name Type Required Description
src InputStream* Yes Input stream positioned at the beginning of the NPY file
parsed_header HeaderData& Yes Output struct to receive parsed metadata
header std::string_view Yes (ParseHeaderContents) Raw NPY header dictionary string to parse
o_direct_alignm size_t Yes (ParseODirectHeader) Memory alignment required for O_DIRECT reads
o_direct_read_len_alignm size_t Yes (ParseODirectHeader) Read length alignment required for O_DIRECT
pinned bool Yes (ReadTensor) Whether to allocate the output tensor in pinned (page-locked) memory
format std::string Yes (TypeFromNumpyStr) NumPy type descriptor string (e.g. "f4", "i8", "u1")

Outputs

Name Type Description
parsed_header HeaderData Populated struct with shape, type_info, fortran_order flag, and data_offset
return value (ReadTensor) Tensor<CPUBackend> CPU tensor containing the loaded array data in C-order
return value (TypeFromNumpyStr) const TypeInfo& DALI TypeInfo reference corresponding to the NumPy type string

Usage Examples

Reading a NumPy File into a Tensor

#include "dali/util/numpy.h"

// Open an InputStream to the .npy file
auto stream = FileStream::Open("data.npy", false, false);

// Read the entire tensor (handles header parsing, data reading, and Fortran-order transposition)
auto tensor = dali::numpy::ReadTensor(stream.get(), /*pinned=*/false);

// tensor.shape() returns the shape
// tensor.type() returns the DALIDataType

Parsing a NumPy Header Separately

#include "dali/util/numpy.h"

auto stream = FileStream::Open("array.npy", false, false);

dali::numpy::HeaderData header;
dali::numpy::ParseHeader(header, stream.get());

// Access parsed metadata
auto dtype = header.type();
auto shape = header.shape;
bool is_fortran = header.fortran_order;
int64_t data_start = header.data_offset;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment