Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:LaurentMazare Tch rs NumPy Format IO

From Leeroopedia


Knowledge Sources
Domains Data Serialization, Interoperability, File Formats
Last Updated 2026-02-08 00:00 GMT

Overview

The NumPy file format provides a standardized binary encoding for array data with self-describing headers, enabling efficient cross-language data exchange between scientific computing ecosystems.

Description

The NumPy file format is a widely-used binary format for persisting array (tensor) data. It comes in two forms:

.npy files store a single array with a self-describing header. The header contains all metadata needed to reconstruct the array:

  • Data type (dtype) -- The element type and byte order (e.g., float32 little-endian)
  • Shape -- The dimensions of the array as a tuple of integers
  • Memory order -- Whether the data is stored in C (row-major) or Fortran (column-major) order

The header is followed by the raw array data as a contiguous block of bytes. This simple structure makes the format easy to implement in any language, not just Python.

.npz files store multiple named arrays in a zip archive. Each entry in the archive is a .npy file with a name corresponding to the array's key. This enables saving and loading collections of arrays (e.g., all parameters of a neural network) in a single file.

The format's key advantages are:

  • Self-describing -- No external schema needed; the header encodes all metadata
  • Language-agnostic -- Simple enough to implement in any language with binary I/O
  • Efficient -- Data is stored in raw binary form with minimal overhead
  • Widely supported -- De facto standard for array interchange in scientific computing

Usage

Apply NumPy format I/O when:

  • Exchanging tensor data between different programming languages or frameworks
  • Persisting preprocessed datasets or model weights in a portable format
  • Loading data that was generated by Python/NumPy pipelines
  • Saving intermediate computation results for later analysis

Theoretical Basis

.npy File Structure

A .npy file consists of three sections:

  1. Magic bytes -- 6 bytes identifying the format: Failed to parse (syntax error): {\displaystyle \texttt{\\x93NUMPY}}
  2. Header -- A Python dictionary literal (as ASCII text) specifying:
    • descr: dtype string (e.g., '<f4' for little-endian float32)
    • fortran_order: boolean for memory layout
    • shape: tuple of dimension sizes
  3. Data -- Raw binary array data, idi×sizeof(dtype) bytes

Dtype Encoding

The dtype string encodes byte order, type character, and byte size:

  • Byte order: '<' (little-endian), '>' (big-endian), '=' (native)
  • Type character: 'f' (float), 'i' (signed int), 'u' (unsigned int), 'b' (bool)
  • Byte count: number of bytes per element

Example: '<f4' means little-endian 32-bit float (4 bytes).

.npz Archive Structure

An .npz file is a standard zip archive where:

  • Each entry has a filename like array_name.npy
  • Each entry's content is a complete .npy file
  • Arrays can be accessed by name without loading the entire archive

Memory Layout

For an array with shape (d1,d2,,dn):

C order (row-major): The last index varies fastest. Element at index (i1,,in) has byte offset:

offset=(k=1nikj=k+1ndj)×sizeof(dtype)

Fortran order (column-major): The first index varies fastest. Element byte offset:

offset=(k=1nikj=1k1dj)×sizeof(dtype)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment