Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datasets Version

From Leeroopedia

Overview

Version provides a semantic version representation for dataset versions in MAJOR.MINOR.PATCH format. The Version class is implemented as a Python dataclass with @total_ordering, enabling full comparison support between version instances. It is a core metadata type used throughout the datasets library to track and compare dataset revisions.

This module is part of the huggingface/datasets repository.

  • Source file: src/datasets/utils/version.py (106 lines)
  • Domain: Versioning, Metadata
  • Import: from datasets import Version or from datasets.utils.version import Version

Class: Version

A dataclass representing a dataset version in MAJOR.MINOR.PATCH format with full ordering support.

@total_ordering
@dataclass
class Version:
    """Dataset version `MAJOR.MINOR.PATCH`."""

    version_str: str
    description: Optional[str] = None
    major: Optional[Union[str, int]] = None
    minor: Optional[Union[str, int]] = None
    patch: Optional[Union[str, int]] = None

Fields:

Field Type Default Description
version_str str (required) The version string in MAJOR.MINOR.PATCH format
description Optional[str] None A description of what is new in this version
major Optional[Union[str, int]] None Major version number (set automatically by __post_init__)
minor Optional[Union[str, int]] None Minor version number (set automatically by __post_init__)
patch Optional[Union[str, int]] None Patch version number (set automatically by __post_init__)

Methods

__post_init__

Parses the version_str and sets the major, minor, and patch fields automatically.

def __post_init__(self):
    self.major, self.minor, self.patch = _str_to_version_tuple(self.version_str)

__repr__

Returns the version as a string in MAJOR.MINOR.PATCH format.

def __repr__(self):
    return f"{self.tuple[0]}.{self.tuple[1]}.{self.tuple[2]}"

tuple (property)

Returns the version as a tuple of (major, minor, patch).

@property
def tuple(self):
    return self.major, self.minor, self.patch

__eq__

Compares two Version instances for equality. Accepts both Version objects and version strings. Returns False for invalid operands instead of raising an exception.

def __eq__(self, other):
    try:
        other = self._validate_operand(other)
    except (TypeError, ValueError):
        return False
    else:
        return self.tuple == other.tuple

__lt__

Compares two Version instances for ordering. Combined with @total_ordering, this provides all comparison operators (<=, >, >=).

def __lt__(self, other):
    other = self._validate_operand(other)
    return self.tuple < other.tuple

__hash__

Returns a hash based on the version string representation, enabling Version objects to be used in sets and as dictionary keys.

def __hash__(self):
    return hash(_version_tuple_to_str(self.tuple))

from_dict (classmethod)

Creates a Version instance from a dictionary. Only keys matching dataclass field names are used, allowing extra keys to be safely ignored.

@classmethod
def from_dict(cls, dic):
    field_names = {f.name for f in dataclasses.fields(cls)}
    return cls(**{k: v for k, v in dic.items() if k in field_names})

_to_yaml_string

Returns the version string for YAML serialization.

def _to_yaml_string(self) -> str:
    return self.version_str

Helper Functions

_str_to_version_tuple

Parses a version string into a tuple of (major, minor, patch) integers. Raises ValueError if the string does not match the expected x.y.z format.

_VERSION_REG = re.compile(r"^(?P<major>\d+)" r"\.(?P<minor>\d+)" r"\.(?P<patch>\d+)$")

def _str_to_version_tuple(version_str):
    res = _VERSION_REG.match(version_str)
    if not res:
        raise ValueError(f"Invalid version '{version_str}'. Format should be x.y.z with {{x,y,z}} being digits.")
    return tuple(int(v) for v in [res.group("major"), res.group("minor"), res.group("patch")])

_version_tuple_to_str

Converts a (major, minor, patch) tuple back into a dot-separated version string.

def _version_tuple_to_str(version_tuple):
    return ".".join(str(v) for v in version_tuple)

Dependencies

Dependency Type Purpose
dataclasses Standard library Dataclass decorator and field introspection
re Standard library Regular expression parsing of version strings
functools.total_ordering Standard library Automatic generation of comparison methods from __eq__ and __lt__
typing Standard library Type annotations (Optional, Union)

Usage Example

from datasets import Version

# Create a version
v1 = Version("1.0.0")
v2 = Version("2.1.0", description="Added new split")

# Access components
print(v1.major, v1.minor, v1.patch)  # 1 0 0
print(v1.tuple)  # (1, 0, 0)

# Compare versions
print(v1 < v2)   # True
print(v1 == "1.0.0")  # True (string comparison supported)

# Create from dictionary
v3 = Version.from_dict({"version_str": "3.0.0", "description": "Major update"})
print(v3)  # 3.0.0

# Use in sets and dicts
version_set = {v1, v2, v3}

Design Notes

  • The @total_ordering decorator is used so that only __eq__ and __lt__ need to be implemented; all other comparison operators are automatically derived.
  • The __post_init__ method ensures that the major, minor, and patch fields are always populated from version_str, even though they are declared as optional fields on the dataclass. This allows Version to be constructed with just a version string.
  • The _validate_operand helper enables comparisons with plain strings by transparently converting them to Version objects.
  • The __eq__ method catches TypeError and ValueError to return False for invalid operands, while __lt__ allows these exceptions to propagate, following standard Python comparison conventions.
  • The regex _VERSION_REG enforces strict MAJOR.MINOR.PATCH format with digits only -- no pre-release or build metadata suffixes are supported.
  • The from_dict classmethod filters dictionary keys to only those that match dataclass fields, making it safe to use with dictionaries that contain extra keys (e.g., from JSON or YAML deserialization).

File Location

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment