Principle:LaurentMazare Tch rs NumPy Format IO
| Knowledge Sources | |
|---|---|
| Domains | Data Serialization, Interoperability, File Formats |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
The NumPy file format provides a standardized binary encoding for array data with self-describing headers, enabling efficient cross-language data exchange between scientific computing ecosystems.
Description
The NumPy file format is a widely-used binary format for persisting array (tensor) data. It comes in two forms:
.npy files store a single array with a self-describing header. The header contains all metadata needed to reconstruct the array:
- Data type (dtype) -- The element type and byte order (e.g., float32 little-endian)
- Shape -- The dimensions of the array as a tuple of integers
- Memory order -- Whether the data is stored in C (row-major) or Fortran (column-major) order
The header is followed by the raw array data as a contiguous block of bytes. This simple structure makes the format easy to implement in any language, not just Python.
.npz files store multiple named arrays in a zip archive. Each entry in the archive is a .npy file with a name corresponding to the array's key. This enables saving and loading collections of arrays (e.g., all parameters of a neural network) in a single file.
The format's key advantages are:
- Self-describing -- No external schema needed; the header encodes all metadata
- Language-agnostic -- Simple enough to implement in any language with binary I/O
- Efficient -- Data is stored in raw binary form with minimal overhead
- Widely supported -- De facto standard for array interchange in scientific computing
Usage
Apply NumPy format I/O when:
- Exchanging tensor data between different programming languages or frameworks
- Persisting preprocessed datasets or model weights in a portable format
- Loading data that was generated by Python/NumPy pipelines
- Saving intermediate computation results for later analysis
Theoretical Basis
.npy File Structure
A .npy file consists of three sections:
- Magic bytes -- 6 bytes identifying the format: Failed to parse (syntax error): {\displaystyle \texttt{\\x93NUMPY}}
- Header -- A Python dictionary literal (as ASCII text) specifying:
- : dtype string (e.g., '<f4' for little-endian float32)
- : boolean for memory layout
- : tuple of dimension sizes
- Data -- Raw binary array data, bytes
Dtype Encoding
The dtype string encodes byte order, type character, and byte size:
- Byte order: '<' (little-endian), '>' (big-endian), '=' (native)
- Type character: 'f' (float), 'i' (signed int), 'u' (unsigned int), 'b' (bool)
- Byte count: number of bytes per element
Example: '<f4' means little-endian 32-bit float (4 bytes).
.npz Archive Structure
An .npz file is a standard zip archive where:
- Each entry has a filename like
- Each entry's content is a complete .npy file
- Arrays can be accessed by name without loading the entire archive
Memory Layout
For an array with shape :
C order (row-major): The last index varies fastest. Element at index has byte offset:
Fortran order (column-major): The first index varies fastest. Element byte offset: