Overview
Concrete tool for verifying dataset integrity through checksums and split checks, provided by the HuggingFace Datasets library.
Description
VerificationMode is a Python enum.Enum with three members that control the level of integrity checking performed when downloading and generating datasets. It is passed to functions like load_dataset via the verification_mode parameter. The three modes range from no verification (fastest) to full checksum and split validation (most thorough). The default mode is BASIC_CHECKS, which validates splits without computing file checksums.
Usage
Use VerificationMode when you want to explicitly control the level of data integrity checking. Pass it as the verification_mode parameter to load_dataset or dataset builder methods. Use ALL_CHECKS for production pipelines requiring strict integrity, BASIC_CHECKS for general use, and NO_CHECKS for rapid development iteration.
Code Reference
Source Location
- Repository: datasets
- File:
src/datasets/utils/info_utils.py
- Lines: 22-40
Signature
class VerificationMode(enum.Enum):
"""Enum that specifies which verification checks to run.
The default mode is BASIC_CHECKS, which will perform only rudimentary checks
to avoid slowdowns when generating/downloading a dataset for the first time.
The verification modes:
| ALL_CHECKS | Split checks and validity (number of files, checksums) of downloaded files |
| BASIC_CHECKS (default) | Same as ALL_CHECKS but without checking downloaded files |
| NO_CHECKS | None |
"""
ALL_CHECKS = "all_checks"
BASIC_CHECKS = "basic_checks"
NO_CHECKS = "no_checks"
Import
from datasets import VerificationMode
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| value |
str |
Yes |
One of "all_checks", "basic_checks", or "no_checks". Typically accessed via the enum member (e.g. VerificationMode.ALL_CHECKS).
|
Outputs
| Name |
Type |
Description
|
| member |
VerificationMode |
An enum member representing the selected verification level.
|
Enum Members
| Member |
Value |
Description
|
ALL_CHECKS |
"all_checks" |
Validates both downloaded file checksums/sizes and split names/example counts. Most thorough but slowest.
|
BASIC_CHECKS |
"basic_checks" |
Validates split names and example counts only. Skips file checksum verification. This is the default.
|
NO_CHECKS |
"no_checks" |
Skips all verification. Fastest mode, suitable for development or trusted data.
|
Usage Examples
Basic Usage
from datasets import load_dataset, VerificationMode
# Load with full integrity checks
ds = load_dataset(
"cornell-movie-review-data/rotten_tomatoes",
verification_mode=VerificationMode.ALL_CHECKS,
)
Skip Verification for Speed
from datasets import load_dataset, VerificationMode
# Skip all checks during development
ds = load_dataset(
"cornell-movie-review-data/rotten_tomatoes",
verification_mode=VerificationMode.NO_CHECKS,
)
Using String Value
from datasets import load_dataset
# String values are also accepted
ds = load_dataset(
"cornell-movie-review-data/rotten_tomatoes",
verification_mode="no_checks",
)
Related Pages
Implements Principle
Page Connections
Double-click a node to navigate. Hold to expand connections.