Environment:Huggingface Datasets PyTorch Integration
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Data_Processing |
| Last Updated | 2026-02-14 19:00 GMT |
Overview
Optional PyTorch integration environment enabling torch tensor output formatting and PyTorch DataLoader compatibility for HuggingFace Datasets.
Description
This environment enables PyTorch tensor output from HuggingFace Datasets. When PyTorch is available, the library registers a TorchFormatter that converts Arrow table data to PyTorch tensors on data access. The library uses lazy runtime imports to avoid hard dependencies -- PyTorch is detected via `importlib.util.find_spec("torch")` and controlled by the `USE_TORCH` environment variable.
Usage
Required when calling `dataset.set_format("torch")` or `dataset.with_format("torch")`, and when using `torch.utils.data.DataLoader` with a HuggingFace Dataset.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, Windows | All platforms supported |
| Hardware | CPU or NVIDIA GPU | GPU optional, for accelerated tensor operations |
Dependencies
Python Packages
- `torch` (no minimum version enforced by datasets, tests use >= 2.8.0)
- `datasets` (core package)
Credentials
No credentials required. The `USE_TORCH` environment variable can be set to control auto-detection:
- `USE_TORCH=1` or `USE_TORCH=TRUE`: Force enable PyTorch
- `USE_TORCH=0` or `USE_TORCH=FALSE`: Force disable PyTorch
- `USE_TORCH=AUTO` (default): Auto-detect based on availability
Quick Install
pip install datasets[torch]
Code Evidence
PyTorch detection from `config.py:49-56`:
USE_TORCH = os.environ.get("USE_TORCH", "AUTO").upper()
TORCH_AVAILABLE = False
TORCH_VERSION = "N/A"
if USE_TORCH in ENV_VARS_TRUE_AND_AUTO_VALUES:
TORCH_AVAILABLE = importlib.util.find_spec("torch") is not None
if TORCH_AVAILABLE:
TORCH_VERSION = version.parse(importlib.metadata.version("torch"))
Formatter registration from `formatting/__init__.py:90-96`:
if config.TORCH_AVAILABLE:
from .torch_formatter import TorchFormatter
_register_formatter(TorchFormatter, "torch")
else:
_register_unavailable_formatter(
ValueError("PyTorch needs to be installed to be able to return PyTorch tensors."), "torch"
)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ValueError: PyTorch needs to be installed to be able to return PyTorch tensors.` | PyTorch not installed | `pip install torch` |
| `ModuleNotFoundError: No module named 'torch'` | PyTorch not available at runtime | Install PyTorch or set `USE_TORCH=AUTO` |
Compatibility Notes
- Windows: Fully supported for PyTorch integration.
- USE_TORCH precedence: If `USE_TORCH=TRUE` is set explicitly, PyTorch detection is forced even if `USE_TF=TRUE` is also set.
- Lazy imports: The TorchFormatter uses TYPE_CHECKING imports for type hints and lazy runtime imports, so PyTorch is only loaded when actually needed.