Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datasets Exceptions

From Leeroopedia

Overview

Exceptions defines the complete exception hierarchy for the Hugging Face Datasets library. All custom exceptions derive from the base DatasetsError class and are organized into three main branches: file/data not found errors, dataset build errors (including generation and format issues), and verification errors (covering both checksums and splits). This module provides structured error handling for data loading, processing, and validation scenarios.

Source File

Property Value
Repository huggingface/datasets
File src/datasets/exceptions.py
Lines 119
Domain Error_Handling

Import

from datasets.exceptions import DatasetsError, DatasetNotFoundError
# Or import specific exceptions as needed:
from datasets.exceptions import (
    DatasetsError,
    DefunctDatasetError,
    FileNotFoundDatasetsError,
    DataFilesNotFoundError,
    DatasetNotFoundError,
    DatasetBuildError,
    ManualDownloadError,
    FileFormatError,
    DatasetGenerationError,
    DatasetGenerationCastError,
    ChecksumVerificationError,
    UnexpectedDownloadedFileError,
    ExpectedMoreDownloadedFilesError,
    NonMatchingChecksumError,
    SplitsVerificationError,
    UnexpectedSplitsError,
    ExpectedMoreSplitsError,
    NonMatchingSplitsSizesError,
)

Exception Hierarchy

Exception
  +-- DatasetsError (base class for all datasets exceptions)
        +-- DefunctDatasetError
        +-- FileNotFoundDatasetsError (also inherits FileNotFoundError)
        |     +-- DataFilesNotFoundError
        |     +-- DatasetNotFoundError
        +-- DatasetBuildError
        |     +-- ManualDownloadError
        |     +-- FileFormatError
        |     +-- DatasetGenerationError
        |           +-- DatasetGenerationCastError
        +-- ChecksumVerificationError
        |     +-- UnexpectedDownloadedFileError
        |     +-- ExpectedMoreDownloadedFilesError
        |     +-- NonMatchingChecksumError
        +-- SplitsVerificationError
              +-- UnexpectedSplitsError
              +-- ExpectedMoreSplitsError
              +-- NonMatchingSplitsSizesError

Classes

DatasetsError

Base class for all exceptions in the datasets library.

class DatasetsError(Exception):
    """Base class for exceptions in this library."""

DefunctDatasetError

Raised when attempting to access a dataset that has been marked as defunct (no longer available or supported).

FileNotFoundDatasetsError

A FileNotFoundError subclass specific to the datasets library. Inherits from both DatasetsError and Python's built-in FileNotFoundError, allowing it to be caught by either exception type.

class FileNotFoundDatasetsError(DatasetsError, FileNotFoundError):
    """FileNotFoundError raised by this library."""

DataFilesNotFoundError

Raised when no supported data files are found at the expected location.

DatasetNotFoundError

Raised when trying to access a missing dataset, or a private/gated dataset when the user is not authenticated.

DatasetBuildError

Base class for errors that occur during the dataset build process.

ManualDownloadError

Raised when a dataset requires manual download steps that have not been completed.

FileFormatError

Raised when a data file has an unsupported or invalid format.

DatasetGenerationError

Raised when an error occurs during dataset generation (the process of creating Arrow tables from raw data).

DatasetGenerationCastError

A specialized DatasetGenerationError with a class method from_cast_error that constructs a detailed error message when data files have mismatched columns. This method:

  1. Extracts details from the CastError about what columns differ.
  2. Traces through tracked generator kwargs to identify which specific data file caused the error.
  3. Resolves Hugging Face Hub URLs to human-readable paths.
  4. Provides a help message linking to the Hub documentation about multiple configurations.
class DatasetGenerationCastError(DatasetGenerationError):
    @classmethod
    def from_cast_error(
        cls,
        cast_error: CastError,
        builder_name: str,
        gen_kwargs: dict[str, Any],
        token: Optional[Union[bool, str]],
    ) -> "DatasetGenerationCastError":
        explanation_message = (
            f"\n\nAll the data files must have the same columns, but at some point {cast_error.details()}"
        )
        formatted_tracked_gen_kwargs: list[str] = []
        for gen_kwarg in gen_kwargs.values():
            if not isinstance(gen_kwarg, (tracked_str, tracked_list, TrackedIterableFromGenerator)):
                continue
            while (
                isinstance(gen_kwarg, (tracked_list, TrackedIterableFromGenerator)) and gen_kwarg.last_item is not None
            ):
                gen_kwarg = gen_kwarg.last_item
            if isinstance(gen_kwarg, tracked_str):
                gen_kwarg = gen_kwarg.get_origin()
            if isinstance(gen_kwarg, str) and gen_kwarg.startswith("hf://"):
                resolved_path = HfFileSystem(endpoint=config.HF_ENDPOINT, token=token).resolve_path(gen_kwarg)
                gen_kwarg = "hf://" + resolved_path.unresolve()
                if "@" + resolved_path.revision in gen_kwarg:
                    gen_kwarg = (
                        gen_kwarg.replace("@" + resolved_path.revision, "", 1)
                        + f" (at revision {resolved_path.revision})"
                    )
            formatted_tracked_gen_kwargs.append(str(gen_kwarg))
        if formatted_tracked_gen_kwargs:
            explanation_message += f"\n\nThis happened while the {builder_name} dataset builder was generating data using\n\n{', '.join(formatted_tracked_gen_kwargs)}"
        help_message = "\n\nPlease either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)"
        return cls("An error occurred while generating the dataset" + explanation_message + help_message)

ChecksumVerificationError

Base class for errors raised during checksum verification of downloaded files.

UnexpectedDownloadedFileError

Raised when downloaded files were not expected (extra files present).

ExpectedMoreDownloadedFilesError

Raised when some expected files were not downloaded (missing files).

NonMatchingChecksumError

Raised when the checksum of a downloaded file does not match the expected checksum.

SplitsVerificationError

Base class for errors raised during split verification.

UnexpectedSplitsError

Raised when unexpected splits are found in the downloaded data.

ExpectedMoreSplitsError

Raised when some recorded/expected splits are missing from the data.

NonMatchingSplitsSizesError

Raised when split sizes do not match the expected sizes.

Dependencies

Module Purpose
huggingface_hub.HfFileSystem Resolving Hub file paths in DatasetGenerationCastError
datasets.config Hub endpoint configuration
datasets.table.CastError Cast error details for column mismatch diagnostics
datasets.utils.track Tracked types for tracing data file origins

Usage

from datasets.exceptions import DatasetNotFoundError, DatasetBuildError

# Catching specific errors during dataset loading
try:
    dataset = load_dataset("nonexistent/dataset")
except DatasetNotFoundError:
    print("Dataset not found or requires authentication")
except DatasetBuildError as e:
    print(f"Error building dataset: {e}")

# Catching all datasets errors
from datasets.exceptions import DatasetsError

try:
    dataset = load_dataset("some_dataset")
except DatasetsError as e:
    print(f"Datasets library error: {e}")

Related Pages

Categories

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment