Implementation:Huggingface Datasets Exceptions
Overview
Exceptions defines the complete exception hierarchy for the Hugging Face Datasets library. All custom exceptions derive from the base DatasetsError class and are organized into three main branches: file/data not found errors, dataset build errors (including generation and format issues), and verification errors (covering both checksums and splits). This module provides structured error handling for data loading, processing, and validation scenarios.
Source File
| Property | Value |
|---|---|
| Repository | huggingface/datasets |
| File | src/datasets/exceptions.py |
| Lines | 119 |
| Domain | Error_Handling |
Import
from datasets.exceptions import DatasetsError, DatasetNotFoundError
# Or import specific exceptions as needed:
from datasets.exceptions import (
DatasetsError,
DefunctDatasetError,
FileNotFoundDatasetsError,
DataFilesNotFoundError,
DatasetNotFoundError,
DatasetBuildError,
ManualDownloadError,
FileFormatError,
DatasetGenerationError,
DatasetGenerationCastError,
ChecksumVerificationError,
UnexpectedDownloadedFileError,
ExpectedMoreDownloadedFilesError,
NonMatchingChecksumError,
SplitsVerificationError,
UnexpectedSplitsError,
ExpectedMoreSplitsError,
NonMatchingSplitsSizesError,
)
Exception Hierarchy
Exception
+-- DatasetsError (base class for all datasets exceptions)
+-- DefunctDatasetError
+-- FileNotFoundDatasetsError (also inherits FileNotFoundError)
| +-- DataFilesNotFoundError
| +-- DatasetNotFoundError
+-- DatasetBuildError
| +-- ManualDownloadError
| +-- FileFormatError
| +-- DatasetGenerationError
| +-- DatasetGenerationCastError
+-- ChecksumVerificationError
| +-- UnexpectedDownloadedFileError
| +-- ExpectedMoreDownloadedFilesError
| +-- NonMatchingChecksumError
+-- SplitsVerificationError
+-- UnexpectedSplitsError
+-- ExpectedMoreSplitsError
+-- NonMatchingSplitsSizesError
Classes
DatasetsError
Base class for all exceptions in the datasets library.
class DatasetsError(Exception):
"""Base class for exceptions in this library."""
DefunctDatasetError
Raised when attempting to access a dataset that has been marked as defunct (no longer available or supported).
FileNotFoundDatasetsError
A FileNotFoundError subclass specific to the datasets library. Inherits from both DatasetsError and Python's built-in FileNotFoundError, allowing it to be caught by either exception type.
class FileNotFoundDatasetsError(DatasetsError, FileNotFoundError):
"""FileNotFoundError raised by this library."""
DataFilesNotFoundError
Raised when no supported data files are found at the expected location.
DatasetNotFoundError
Raised when trying to access a missing dataset, or a private/gated dataset when the user is not authenticated.
DatasetBuildError
Base class for errors that occur during the dataset build process.
ManualDownloadError
Raised when a dataset requires manual download steps that have not been completed.
FileFormatError
Raised when a data file has an unsupported or invalid format.
DatasetGenerationError
Raised when an error occurs during dataset generation (the process of creating Arrow tables from raw data).
DatasetGenerationCastError
A specialized DatasetGenerationError with a class method from_cast_error that constructs a detailed error message when data files have mismatched columns. This method:
- Extracts details from the
CastErrorabout what columns differ. - Traces through tracked generator kwargs to identify which specific data file caused the error.
- Resolves Hugging Face Hub URLs to human-readable paths.
- Provides a help message linking to the Hub documentation about multiple configurations.
class DatasetGenerationCastError(DatasetGenerationError):
@classmethod
def from_cast_error(
cls,
cast_error: CastError,
builder_name: str,
gen_kwargs: dict[str, Any],
token: Optional[Union[bool, str]],
) -> "DatasetGenerationCastError":
explanation_message = (
f"\n\nAll the data files must have the same columns, but at some point {cast_error.details()}"
)
formatted_tracked_gen_kwargs: list[str] = []
for gen_kwarg in gen_kwargs.values():
if not isinstance(gen_kwarg, (tracked_str, tracked_list, TrackedIterableFromGenerator)):
continue
while (
isinstance(gen_kwarg, (tracked_list, TrackedIterableFromGenerator)) and gen_kwarg.last_item is not None
):
gen_kwarg = gen_kwarg.last_item
if isinstance(gen_kwarg, tracked_str):
gen_kwarg = gen_kwarg.get_origin()
if isinstance(gen_kwarg, str) and gen_kwarg.startswith("hf://"):
resolved_path = HfFileSystem(endpoint=config.HF_ENDPOINT, token=token).resolve_path(gen_kwarg)
gen_kwarg = "hf://" + resolved_path.unresolve()
if "@" + resolved_path.revision in gen_kwarg:
gen_kwarg = (
gen_kwarg.replace("@" + resolved_path.revision, "", 1)
+ f" (at revision {resolved_path.revision})"
)
formatted_tracked_gen_kwargs.append(str(gen_kwarg))
if formatted_tracked_gen_kwargs:
explanation_message += f"\n\nThis happened while the {builder_name} dataset builder was generating data using\n\n{', '.join(formatted_tracked_gen_kwargs)}"
help_message = "\n\nPlease either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)"
return cls("An error occurred while generating the dataset" + explanation_message + help_message)
ChecksumVerificationError
Base class for errors raised during checksum verification of downloaded files.
UnexpectedDownloadedFileError
Raised when downloaded files were not expected (extra files present).
ExpectedMoreDownloadedFilesError
Raised when some expected files were not downloaded (missing files).
NonMatchingChecksumError
Raised when the checksum of a downloaded file does not match the expected checksum.
SplitsVerificationError
Base class for errors raised during split verification.
UnexpectedSplitsError
Raised when unexpected splits are found in the downloaded data.
ExpectedMoreSplitsError
Raised when some recorded/expected splits are missing from the data.
NonMatchingSplitsSizesError
Raised when split sizes do not match the expected sizes.
Dependencies
| Module | Purpose |
|---|---|
huggingface_hub.HfFileSystem |
Resolving Hub file paths in DatasetGenerationCastError
|
datasets.config |
Hub endpoint configuration |
datasets.table.CastError |
Cast error details for column mismatch diagnostics |
datasets.utils.track |
Tracked types for tracing data file origins |
Usage
from datasets.exceptions import DatasetNotFoundError, DatasetBuildError
# Catching specific errors during dataset loading
try:
dataset = load_dataset("nonexistent/dataset")
except DatasetNotFoundError:
print("Dataset not found or requires authentication")
except DatasetBuildError as e:
print(f"Error building dataset: {e}")
# Catching all datasets errors
from datasets.exceptions import DatasetsError
try:
dataset = load_dataset("some_dataset")
except DatasetsError as e:
print(f"Datasets library error: {e}")