Implementation:Huggingface Datasets Version
Overview
Version provides a semantic version representation for dataset versions in MAJOR.MINOR.PATCH format. The Version class is implemented as a Python dataclass with @total_ordering, enabling full comparison support between version instances. It is a core metadata type used throughout the datasets library to track and compare dataset revisions.
This module is part of the huggingface/datasets repository.
- Source file: src/datasets/utils/version.py (106 lines)
- Domain: Versioning, Metadata
- Import:
from datasets import Versionorfrom datasets.utils.version import Version
Class: Version
A dataclass representing a dataset version in MAJOR.MINOR.PATCH format with full ordering support.
@total_ordering
@dataclass
class Version:
"""Dataset version `MAJOR.MINOR.PATCH`."""
version_str: str
description: Optional[str] = None
major: Optional[Union[str, int]] = None
minor: Optional[Union[str, int]] = None
patch: Optional[Union[str, int]] = None
Fields:
| Field | Type | Default | Description |
|---|---|---|---|
version_str |
str |
(required) | The version string in MAJOR.MINOR.PATCH format
|
description |
Optional[str] |
None |
A description of what is new in this version |
major |
Optional[Union[str, int]] |
None |
Major version number (set automatically by __post_init__)
|
minor |
Optional[Union[str, int]] |
None |
Minor version number (set automatically by __post_init__)
|
patch |
Optional[Union[str, int]] |
None |
Patch version number (set automatically by __post_init__)
|
Methods
__post_init__
Parses the version_str and sets the major, minor, and patch fields automatically.
def __post_init__(self):
self.major, self.minor, self.patch = _str_to_version_tuple(self.version_str)
__repr__
Returns the version as a string in MAJOR.MINOR.PATCH format.
def __repr__(self):
return f"{self.tuple[0]}.{self.tuple[1]}.{self.tuple[2]}"
tuple (property)
Returns the version as a tuple of (major, minor, patch).
@property
def tuple(self):
return self.major, self.minor, self.patch
__eq__
Compares two Version instances for equality. Accepts both Version objects and version strings. Returns False for invalid operands instead of raising an exception.
def __eq__(self, other):
try:
other = self._validate_operand(other)
except (TypeError, ValueError):
return False
else:
return self.tuple == other.tuple
__lt__
Compares two Version instances for ordering. Combined with @total_ordering, this provides all comparison operators (<=, >, >=).
def __lt__(self, other):
other = self._validate_operand(other)
return self.tuple < other.tuple
__hash__
Returns a hash based on the version string representation, enabling Version objects to be used in sets and as dictionary keys.
def __hash__(self):
return hash(_version_tuple_to_str(self.tuple))
from_dict (classmethod)
Creates a Version instance from a dictionary. Only keys matching dataclass field names are used, allowing extra keys to be safely ignored.
@classmethod
def from_dict(cls, dic):
field_names = {f.name for f in dataclasses.fields(cls)}
return cls(**{k: v for k, v in dic.items() if k in field_names})
_to_yaml_string
Returns the version string for YAML serialization.
def _to_yaml_string(self) -> str:
return self.version_str
Helper Functions
_str_to_version_tuple
Parses a version string into a tuple of (major, minor, patch) integers. Raises ValueError if the string does not match the expected x.y.z format.
_VERSION_REG = re.compile(r"^(?P<major>\d+)" r"\.(?P<minor>\d+)" r"\.(?P<patch>\d+)$")
def _str_to_version_tuple(version_str):
res = _VERSION_REG.match(version_str)
if not res:
raise ValueError(f"Invalid version '{version_str}'. Format should be x.y.z with {{x,y,z}} being digits.")
return tuple(int(v) for v in [res.group("major"), res.group("minor"), res.group("patch")])
_version_tuple_to_str
Converts a (major, minor, patch) tuple back into a dot-separated version string.
def _version_tuple_to_str(version_tuple):
return ".".join(str(v) for v in version_tuple)
Dependencies
| Dependency | Type | Purpose |
|---|---|---|
dataclasses |
Standard library | Dataclass decorator and field introspection |
re |
Standard library | Regular expression parsing of version strings |
functools.total_ordering |
Standard library | Automatic generation of comparison methods from __eq__ and __lt__
|
typing |
Standard library | Type annotations (Optional, Union)
|
Usage Example
from datasets import Version
# Create a version
v1 = Version("1.0.0")
v2 = Version("2.1.0", description="Added new split")
# Access components
print(v1.major, v1.minor, v1.patch) # 1 0 0
print(v1.tuple) # (1, 0, 0)
# Compare versions
print(v1 < v2) # True
print(v1 == "1.0.0") # True (string comparison supported)
# Create from dictionary
v3 = Version.from_dict({"version_str": "3.0.0", "description": "Major update"})
print(v3) # 3.0.0
# Use in sets and dicts
version_set = {v1, v2, v3}
Design Notes
- The
@total_orderingdecorator is used so that only__eq__and__lt__need to be implemented; all other comparison operators are automatically derived. - The
__post_init__method ensures that themajor,minor, andpatchfields are always populated fromversion_str, even though they are declared as optional fields on the dataclass. This allowsVersionto be constructed with just a version string. - The
_validate_operandhelper enables comparisons with plain strings by transparently converting them toVersionobjects. - The
__eq__method catchesTypeErrorandValueErrorto returnFalsefor invalid operands, while__lt__allows these exceptions to propagate, following standard Python comparison conventions. - The regex
_VERSION_REGenforces strictMAJOR.MINOR.PATCHformat with digits only -- no pre-release or build metadata suffixes are supported. - The
from_dictclassmethod filters dictionary keys to only those that match dataclass fields, making it safe to use with dictionaries that contain extra keys (e.g., from JSON or YAML deserialization).
File Location
- Repository: huggingface/datasets
- Full path: src/datasets/utils/version.py