Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Iterative Dvc Utils Core Helpers

From Leeroopedia


Knowledge Sources
Domains Utilities, Hashing, Path_Operations
Last Updated 2026-02-10 10:00 GMT

Overview

Concrete collection of general-purpose utility functions for hashing, path manipulation, colorization, and environment handling. This module serves as a shared utility layer used across the DVC codebase, providing functions for content hashing (MD5, SHA256), cross-platform path operations, terminal output formatting, environment variable fixes for PyInstaller and pyenv, and stage target parsing.

Source: dvc/utils/__init__.py (415 lines)

Signature

LARGE_DIR_SIZE = 100
TARGET_REGEX = re.compile(r"(?P<path>.*?)(:(?P<name>[^\\/:]*))??$")

def bytes_hash(byts, typ): ...
def dict_filter(d, exclude=()): ...
def dict_hash(d, typ, exclude=()): ...
def dict_md5(d, **kwargs): ...
def dict_sha256(d, **kwargs): ...
def is_binary(): ...
def fix_env(env=None): ...
def colorize(message, color=None, style=None): ...
def boxify(message, border_color=None): ...
def relpath(path, start=os.curdir): ...
def as_posix(path: str) -> str: ...
def env2bool(var, undefined=False): ...
def resolve_output(inp: str, out: Optional[str], force=False) -> str: ...
def resolve_paths(repo, out, always_local=False): ...
def format_link(link): ...
def error_link(name): ...
def parse_target(target: str, default=None, isa_glob=False) -> tuple[Optional[str], Optional[str]]: ...
def glob_targets(targets, glob=True, recursive=True): ...
def error_handler(func): ...
def errored_revisions(rev_data: dict) -> list: ...
def isatty(stream: "Optional[TextIO]") -> bool: ...

Import

from dvc.utils import dict_md5, relpath, parse_target, resolve_output

Description

Hashing Functions

Function Signature Description
bytes_hash (byts, typ) Hashes raw bytes using the specified algorithm (e.g., "md5", "sha256"). Uses hashlib with usedforsecurity=False.
dict_filter (d, exclude=()) Recursively filters keys from nested dicts/lists, returning a copy with specified keys excluded.
dict_hash (d, typ, exclude=()) Filters a dict, serializes to sorted JSON, and hashes the result using the specified algorithm.
dict_md5 (d, **kwargs) Convenience wrapper for dict_hash(d, "md5", ...).
dict_sha256 (d, **kwargs) Convenience wrapper for dict_hash(d, "sha256", ...).

Path Operations

Function Signature Description
relpath (path, start=os.curdir) Cross-platform relative path calculation. Handles Windows drives where paths on different drives have no relative path.
as_posix (path: str) -> str Converts Windows-style backslashes to POSIX forward slashes.
resolve_output (inp, out, force=False) -> str Resolves an output path from an input URL. If out is a directory, appends the basename of inp. Raises FileExistsLocallyError if the target exists and force is False.
resolve_paths (repo, out, always_local=False) Resolves working directory, output path, and .dvc file path for an output. Handles URL schemes, symlink detection, and Windows drive letters. Returns (path, wdir, out) tuple.

Terminal Formatting

Function Signature Description
colorize (message, color=None, style=None) Wraps a message with colorama ANSI codes. Supported colors: green, yellow, blue, red, magenta, cyan. Supported styles: dim, bold.
boxify (message, border_color=None) Draws an ASCII box around a multi-line message with horizontal padding of 5 and vertical padding of 1. Handles ANSI-aware width calculation via _visual_width().
format_link (link) Wraps a URL in angle brackets with cyan coloring: <cyan>link</cyan>.
error_link (name) Returns a formatted link to https://error.dvc.org/{name}.

Environment Handling

Function Signature Description
is_binary () Returns True if running inside a PyInstaller bundle (checks sys.frozen).
fix_env (env=None) Returns a copy of environment variables with PyInstaller and pyenv modifications reversed. For PyInstaller: restores LD_LIBRARY_PATH from LD_LIBRARY_PATH_ORIG. For pyenv: strips pyenv-injected PATH entries (PYENV_BIN_PATH, bin_path, plugin_bin).
env2bool (var, undefined=False) Reads an environment variable and converts it to a boolean. Matches 1, y, yes, or true (case-insensitive). Returns undefined if the variable is not set.

Target Parsing

Function Signature Description
parse_target (target, default=None, isa_glob=False) -> tuple[Optional[str], Optional[str]] Parses a DVC target string of the form path:name or path:name@key. In glob mode, splits on the last colon. Validates filenames against is_valid_filename() and rejects .lock files. Returns (path, name) tuple.
glob_targets (targets, glob=True, recursive=True) Expands a list of target patterns using Python's glob.iglob(). Raises DvcException if no matches are found.

Error Handling

Function Signature Description
error_handler (func) Decorator that wraps a function to catch all exceptions and pass them to an optional onerror callback from kwargs. Returns a dict with "data" key on success.
errored_revisions (rev_data: dict) -> list Scans revision data for nested "error" keys and returns a list of revision identifiers that contain errors.
isatty (stream) -> bool Safely checks if a stream is a TTY, returning False for None streams.

Internal Helpers

def _split(list_to_split, chunk_size):
    """Split a list into chunks of the given size."""

def _visual_width(line):
    """Get the number of columns required to display a string, stripping ANSI codes."""

def _visual_center(line, width):
    """Center-align a string according to its visual width (ANSI-aware)."""

Constants

Constant Value Description
LARGE_DIR_SIZE 100 Threshold for large directory handling
TARGET_REGEX r"(?P<path>.*?)(:(?P<name>[^\\/:]*))??$" Regex for parsing path:name target strings

Dependencies

Dependency Usage
hashlib Content hashing (MD5, SHA256)
json Dict serialization for hashing
colorama ANSI color codes and AnsiToWin32.ANSI_CSI_RE for visual width calculation
dvc.dvcfile DVC_FILE_SUFFIX, LOCK_FILE, PROJECT_FILE, is_valid_filename
dvc.exceptions FileExistsLocallyError, DvcException
dvc.parsing JOIN constant for target parsing
dvc.utils.fs contains_symlink_up_to for symlink detection
dvc.utils.collections nested_contains for error scanning

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment