Implementation:Huggingface Datasets Nifti
| Source | src/datasets/features/nifti.py (lines 63-301) |
|---|---|
| Domain(s) | Medical_Imaging, Data_Processing |
| Last Updated | 2026-02-14 |
Overview
Description
Nifti is an experimental feature type (dataclass) in the HuggingFace Datasets library for handling NIfTI (Neuroimaging Informatics Technology Initiative) files. NIfTI is the standard file format used in neuroimaging research, commonly produced by MRI scanners and neuroimaging analysis software.
The Nifti feature type enables datasets to store, encode, and decode NIfTI files seamlessly. It follows the same bytes-or-path pattern used by other binary feature types in the library (such as Image and Audio). Internally, NIfTI data is stored in an Arrow struct with bytes and path fields. When decoding is enabled, the feature returns a Nifti1ImageWrapper object (a subclass of nibabel.nifti1.Nifti1Image) that supports interactive 3D rendering in Jupyter notebooks via ipyniivue.
The class supports lazy loading from local paths, streaming from the HuggingFace Hub with token-based authentication, gzip-compressed NIfTI files (.nii.gz), and embedding of remote file bytes into Arrow storage.
Usage
Use the Nifti feature type when building or consuming datasets that contain NIfTI neuroimaging files. Common scenarios include:
- Creating medical imaging datasets with brain MRI scans.
- Loading existing NIfTI-based datasets from the HuggingFace Hub.
- Casting a column of file paths to a decoded NIfTI feature for interactive inspection.
- Toggling decoding off (
decode=False) to retrieve raw path/bytes metadata instead of loaded images.
Code Reference
Source Location
Repository: huggingface/datasets
File: src/datasets/features/nifti.py (lines 63-301)
Signature
@dataclass
class Nifti:
decode: bool = True
id: Optional[str] = None
# Class variables
dtype: ClassVar[str] = "nibabel.nifti1.Nifti1Image"
pa_type: ClassVar[Any] = pa.struct({"bytes": pa.binary(), "path": pa.string()})
Key Methods:
encode_example(self, value: Union[str, bytes, bytearray, dict, nib.Nifti1Image]) -> dict-- Encodes an input value into the Arrow-compatible{"path": ..., "bytes": ...}dictionary format.decode_example(self, value: dict, token_per_repo_id=None) -> Nifti1ImageWrapper-- Decodes an Arrow-stored dictionary back into aNifti1ImageWrapperobject, handling local paths, remote Hub URLs, gzip decompression, and raw bytes.embed_storage(self, storage: pa.StructArray, token_per_repo_id=None) -> pa.StructArray-- Downloads and embeds remote NIfTI file bytes directly into the Arrow array, replacing path references with inline bytes.flatten(self) -> Union[FeatureType, Dict[str, FeatureType]]-- Returns self when decoding is enabled, otherwise flattens to{"bytes": Value("binary"), "path": Value("string")}.cast_storage(self, storage: Union[pa.StringArray, pa.StructArray, pa.BinaryArray]) -> pa.StructArray-- Casts various Arrow array types (pa.string(),pa.binary(), orpa.struct) into the canonical Nifti storage struct.
Import
from datasets import Nifti
# or
from datasets.features import Nifti
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
decode |
bool (default: True) |
Whether to decode NIfTI data into Nifti1ImageWrapper objects on access. Set to False to return raw path/bytes dicts.
|
id |
Optional[str] (default: None) |
Optional identifier for the feature. |
encode_example accepts:
| Input Type | Description |
|---|---|
str |
Absolute path to a NIfTI file. |
pathlib.Path |
Path object pointing to a NIfTI file. |
bytes / bytearray |
Raw NIfTI file content. |
dict |
Dictionary with "path" and/or "bytes" keys.
|
nib.Nifti1Image |
A nibabel spatial image object. |
Outputs
| Condition | Output Type | Description |
|---|---|---|
decode=True |
Nifti1ImageWrapper |
A nibabel Nifti1Image subclass with Jupyter HTML rendering support via ipyniivue.
|
decode=False |
dict |
A dictionary with "bytes" (Optional[bytes]) and "path" (Optional[str]) keys.
|
Usage Examples
Creating a dataset with NIfTI files
from datasets import Dataset, Nifti
# Create a dataset from file paths
ds = Dataset.from_dict({
"nifti": ["path/to/brain_scan1.nii.gz", "path/to/brain_scan2.nii.gz"]
}).cast_column("nifti", Nifti())
# Access a decoded NIfTI image
nifti_image = ds[0]["nifti"]
print(type(nifti_image)) # Nifti1ImageWrapper (subclass of Nifti1Image)
print(nifti_image.shape) # e.g., (256, 256, 176)
Disabling decoding for raw metadata
from datasets import Dataset, Nifti
ds = Dataset.from_dict({
"nifti": ["path/to/brain_scan.nii.gz"]
}).cast_column("nifti", Nifti(decode=False))
# Returns raw path/bytes dict instead of a nibabel object
print(ds[0]["nifti"])
# {'bytes': None, 'path': 'path/to/brain_scan.nii.gz'}
Encoding a nibabel image object directly
import nibabel as nib
import numpy as np
from datasets import Dataset, Nifti
# Create a NIfTI image programmatically
data = np.random.rand(64, 64, 32).astype(np.float32)
affine = np.eye(4)
img = nib.Nifti1Image(data, affine)
# Encode into a dataset
nifti_feature = Nifti()
encoded = nifti_feature.encode_example(img)
ds = Dataset.from_dict({"scan": [encoded]}).cast_column("scan", Nifti())
Related Pages
Principles
- NIfTI Feature Handling -- Principle for encoding, decoding, and managing NIfTI neuroimaging data within HuggingFace Datasets.
Environments
- Huggingface Datasets -- The parent library providing the feature type system and dataset infrastructure.
- Medical Imaging -- Domain context for neuroimaging data handling.
Related Implementations
Image-- Similar binary feature type for image data, following the same bytes/path storage pattern.Audio-- Similar binary feature type for audio data.