Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Huggingface Datasets PyTorch Integration

From Leeroopedia
Knowledge Sources
Domains Deep_Learning, Data_Processing
Last Updated 2026-02-14 19:00 GMT

Overview

Optional PyTorch integration environment enabling torch tensor output formatting and PyTorch DataLoader compatibility for HuggingFace Datasets.

Description

This environment enables PyTorch tensor output from HuggingFace Datasets. When PyTorch is available, the library registers a TorchFormatter that converts Arrow table data to PyTorch tensors on data access. The library uses lazy runtime imports to avoid hard dependencies -- PyTorch is detected via `importlib.util.find_spec("torch")` and controlled by the `USE_TORCH` environment variable.

Usage

Required when calling `dataset.set_format("torch")` or `dataset.with_format("torch")`, and when using `torch.utils.data.DataLoader` with a HuggingFace Dataset.

System Requirements

Category Requirement Notes
OS Linux, macOS, Windows All platforms supported
Hardware CPU or NVIDIA GPU GPU optional, for accelerated tensor operations

Dependencies

Python Packages

  • `torch` (no minimum version enforced by datasets, tests use >= 2.8.0)
  • `datasets` (core package)

Credentials

No credentials required. The `USE_TORCH` environment variable can be set to control auto-detection:

  • `USE_TORCH=1` or `USE_TORCH=TRUE`: Force enable PyTorch
  • `USE_TORCH=0` or `USE_TORCH=FALSE`: Force disable PyTorch
  • `USE_TORCH=AUTO` (default): Auto-detect based on availability

Quick Install

pip install datasets[torch]

Code Evidence

PyTorch detection from `config.py:49-56`:

USE_TORCH = os.environ.get("USE_TORCH", "AUTO").upper()
TORCH_AVAILABLE = False
TORCH_VERSION = "N/A"
if USE_TORCH in ENV_VARS_TRUE_AND_AUTO_VALUES:
    TORCH_AVAILABLE = importlib.util.find_spec("torch") is not None
    if TORCH_AVAILABLE:
        TORCH_VERSION = version.parse(importlib.metadata.version("torch"))

Formatter registration from `formatting/__init__.py:90-96`:

if config.TORCH_AVAILABLE:
    from .torch_formatter import TorchFormatter
    _register_formatter(TorchFormatter, "torch")
else:
    _register_unavailable_formatter(
        ValueError("PyTorch needs to be installed to be able to return PyTorch tensors."), "torch"
    )

Common Errors

Error Message Cause Solution
`ValueError: PyTorch needs to be installed to be able to return PyTorch tensors.` PyTorch not installed `pip install torch`
`ModuleNotFoundError: No module named 'torch'` PyTorch not available at runtime Install PyTorch or set `USE_TORCH=AUTO`

Compatibility Notes

  • Windows: Fully supported for PyTorch integration.
  • USE_TORCH precedence: If `USE_TORCH=TRUE` is set explicitly, PyTorch detection is forced even if `USE_TF=TRUE` is also set.
  • Lazy imports: The TorchFormatter uses TYPE_CHECKING imports for type hints and lazy runtime imports, so PyTorch is only loaded when actually needed.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment