Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datasets Nifti

From Leeroopedia
Revision as of 13:00, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Huggingface_Datasets_Nifti.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Source src/datasets/features/nifti.py (lines 63-301)
Domain(s) Medical_Imaging, Data_Processing
Last Updated 2026-02-14

Overview

Description

Nifti is an experimental feature type (dataclass) in the HuggingFace Datasets library for handling NIfTI (Neuroimaging Informatics Technology Initiative) files. NIfTI is the standard file format used in neuroimaging research, commonly produced by MRI scanners and neuroimaging analysis software.

The Nifti feature type enables datasets to store, encode, and decode NIfTI files seamlessly. It follows the same bytes-or-path pattern used by other binary feature types in the library (such as Image and Audio). Internally, NIfTI data is stored in an Arrow struct with bytes and path fields. When decoding is enabled, the feature returns a Nifti1ImageWrapper object (a subclass of nibabel.nifti1.Nifti1Image) that supports interactive 3D rendering in Jupyter notebooks via ipyniivue.

The class supports lazy loading from local paths, streaming from the HuggingFace Hub with token-based authentication, gzip-compressed NIfTI files (.nii.gz), and embedding of remote file bytes into Arrow storage.

Usage

Use the Nifti feature type when building or consuming datasets that contain NIfTI neuroimaging files. Common scenarios include:

  • Creating medical imaging datasets with brain MRI scans.
  • Loading existing NIfTI-based datasets from the HuggingFace Hub.
  • Casting a column of file paths to a decoded NIfTI feature for interactive inspection.
  • Toggling decoding off (decode=False) to retrieve raw path/bytes metadata instead of loaded images.

Code Reference

Source Location

Repository: huggingface/datasets

File: src/datasets/features/nifti.py (lines 63-301)

Signature

@dataclass
class Nifti:
    decode: bool = True
    id: Optional[str] = None

    # Class variables
    dtype: ClassVar[str] = "nibabel.nifti1.Nifti1Image"
    pa_type: ClassVar[Any] = pa.struct({"bytes": pa.binary(), "path": pa.string()})

Key Methods:

  • encode_example(self, value: Union[str, bytes, bytearray, dict, nib.Nifti1Image]) -> dict -- Encodes an input value into the Arrow-compatible {"path": ..., "bytes": ...} dictionary format.
  • decode_example(self, value: dict, token_per_repo_id=None) -> Nifti1ImageWrapper -- Decodes an Arrow-stored dictionary back into a Nifti1ImageWrapper object, handling local paths, remote Hub URLs, gzip decompression, and raw bytes.
  • embed_storage(self, storage: pa.StructArray, token_per_repo_id=None) -> pa.StructArray -- Downloads and embeds remote NIfTI file bytes directly into the Arrow array, replacing path references with inline bytes.
  • flatten(self) -> Union[FeatureType, Dict[str, FeatureType]] -- Returns self when decoding is enabled, otherwise flattens to {"bytes": Value("binary"), "path": Value("string")}.
  • cast_storage(self, storage: Union[pa.StringArray, pa.StructArray, pa.BinaryArray]) -> pa.StructArray -- Casts various Arrow array types (pa.string(), pa.binary(), or pa.struct) into the canonical Nifti storage struct.

Import

from datasets import Nifti
# or
from datasets.features import Nifti

I/O Contract

Inputs

Parameter Type Description
decode bool (default: True) Whether to decode NIfTI data into Nifti1ImageWrapper objects on access. Set to False to return raw path/bytes dicts.
id Optional[str] (default: None) Optional identifier for the feature.

encode_example accepts:

Input Type Description
str Absolute path to a NIfTI file.
pathlib.Path Path object pointing to a NIfTI file.
bytes / bytearray Raw NIfTI file content.
dict Dictionary with "path" and/or "bytes" keys.
nib.Nifti1Image A nibabel spatial image object.

Outputs

Condition Output Type Description
decode=True Nifti1ImageWrapper A nibabel Nifti1Image subclass with Jupyter HTML rendering support via ipyniivue.
decode=False dict A dictionary with "bytes" (Optional[bytes]) and "path" (Optional[str]) keys.

Usage Examples

Creating a dataset with NIfTI files

from datasets import Dataset, Nifti

# Create a dataset from file paths
ds = Dataset.from_dict({
    "nifti": ["path/to/brain_scan1.nii.gz", "path/to/brain_scan2.nii.gz"]
}).cast_column("nifti", Nifti())

# Access a decoded NIfTI image
nifti_image = ds[0]["nifti"]
print(type(nifti_image))  # Nifti1ImageWrapper (subclass of Nifti1Image)
print(nifti_image.shape)  # e.g., (256, 256, 176)

Disabling decoding for raw metadata

from datasets import Dataset, Nifti

ds = Dataset.from_dict({
    "nifti": ["path/to/brain_scan.nii.gz"]
}).cast_column("nifti", Nifti(decode=False))

# Returns raw path/bytes dict instead of a nibabel object
print(ds[0]["nifti"])
# {'bytes': None, 'path': 'path/to/brain_scan.nii.gz'}

Encoding a nibabel image object directly

import nibabel as nib
import numpy as np
from datasets import Dataset, Nifti

# Create a NIfTI image programmatically
data = np.random.rand(64, 64, 32).astype(np.float32)
affine = np.eye(4)
img = nib.Nifti1Image(data, affine)

# Encode into a dataset
nifti_feature = Nifti()
encoded = nifti_feature.encode_example(img)
ds = Dataset.from_dict({"scan": [encoded]}).cast_column("scan", Nifti())

Related Pages

Principles

  • NIfTI Feature Handling -- Principle for encoding, decoding, and managing NIfTI neuroimaging data within HuggingFace Datasets.

Environments

Related Implementations

  • Image -- Similar binary feature type for image data, following the same bytes/path storage pattern.
  • Audio -- Similar binary feature type for audio data.

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment