Implementation:Iterative Dvc Utils Core Helpers
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Utilities, Hashing, Path_Operations |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
Concrete collection of general-purpose utility functions for hashing, path manipulation, colorization, and environment handling. This module serves as a shared utility layer used across the DVC codebase, providing functions for content hashing (MD5, SHA256), cross-platform path operations, terminal output formatting, environment variable fixes for PyInstaller and pyenv, and stage target parsing.
Source: dvc/utils/__init__.py (415 lines)
Signature
LARGE_DIR_SIZE = 100
TARGET_REGEX = re.compile(r"(?P<path>.*?)(:(?P<name>[^\\/:]*))??$")
def bytes_hash(byts, typ): ...
def dict_filter(d, exclude=()): ...
def dict_hash(d, typ, exclude=()): ...
def dict_md5(d, **kwargs): ...
def dict_sha256(d, **kwargs): ...
def is_binary(): ...
def fix_env(env=None): ...
def colorize(message, color=None, style=None): ...
def boxify(message, border_color=None): ...
def relpath(path, start=os.curdir): ...
def as_posix(path: str) -> str: ...
def env2bool(var, undefined=False): ...
def resolve_output(inp: str, out: Optional[str], force=False) -> str: ...
def resolve_paths(repo, out, always_local=False): ...
def format_link(link): ...
def error_link(name): ...
def parse_target(target: str, default=None, isa_glob=False) -> tuple[Optional[str], Optional[str]]: ...
def glob_targets(targets, glob=True, recursive=True): ...
def error_handler(func): ...
def errored_revisions(rev_data: dict) -> list: ...
def isatty(stream: "Optional[TextIO]") -> bool: ...
Import
from dvc.utils import dict_md5, relpath, parse_target, resolve_output
Description
Hashing Functions
| Function | Signature | Description |
|---|---|---|
bytes_hash |
(byts, typ) |
Hashes raw bytes using the specified algorithm (e.g., "md5", "sha256"). Uses hashlib with usedforsecurity=False.
|
dict_filter |
(d, exclude=()) |
Recursively filters keys from nested dicts/lists, returning a copy with specified keys excluded. |
dict_hash |
(d, typ, exclude=()) |
Filters a dict, serializes to sorted JSON, and hashes the result using the specified algorithm. |
dict_md5 |
(d, **kwargs) |
Convenience wrapper for dict_hash(d, "md5", ...).
|
dict_sha256 |
(d, **kwargs) |
Convenience wrapper for dict_hash(d, "sha256", ...).
|
Path Operations
| Function | Signature | Description |
|---|---|---|
relpath |
(path, start=os.curdir) |
Cross-platform relative path calculation. Handles Windows drives where paths on different drives have no relative path. |
as_posix |
(path: str) -> str |
Converts Windows-style backslashes to POSIX forward slashes. |
resolve_output |
(inp, out, force=False) -> str |
Resolves an output path from an input URL. If out is a directory, appends the basename of inp. Raises FileExistsLocallyError if the target exists and force is False.
|
resolve_paths |
(repo, out, always_local=False) |
Resolves working directory, output path, and .dvc file path for an output. Handles URL schemes, symlink detection, and Windows drive letters. Returns (path, wdir, out) tuple.
|
Terminal Formatting
| Function | Signature | Description |
|---|---|---|
colorize |
(message, color=None, style=None) |
Wraps a message with colorama ANSI codes. Supported colors: green, yellow, blue, red, magenta, cyan. Supported styles: dim, bold. |
boxify |
(message, border_color=None) |
Draws an ASCII box around a multi-line message with horizontal padding of 5 and vertical padding of 1. Handles ANSI-aware width calculation via _visual_width().
|
format_link |
(link) |
Wraps a URL in angle brackets with cyan coloring: <cyan>link</cyan>.
|
error_link |
(name) |
Returns a formatted link to https://error.dvc.org/{name}.
|
Environment Handling
| Function | Signature | Description |
|---|---|---|
is_binary |
() |
Returns True if running inside a PyInstaller bundle (checks sys.frozen).
|
fix_env |
(env=None) |
Returns a copy of environment variables with PyInstaller and pyenv modifications reversed. For PyInstaller: restores LD_LIBRARY_PATH from LD_LIBRARY_PATH_ORIG. For pyenv: strips pyenv-injected PATH entries (PYENV_BIN_PATH, bin_path, plugin_bin).
|
env2bool |
(var, undefined=False) |
Reads an environment variable and converts it to a boolean. Matches 1, y, yes, or true (case-insensitive). Returns undefined if the variable is not set.
|
Target Parsing
| Function | Signature | Description |
|---|---|---|
parse_target |
(target, default=None, isa_glob=False) -> tuple[Optional[str], Optional[str]] |
Parses a DVC target string of the form path:name or path:name@key. In glob mode, splits on the last colon. Validates filenames against is_valid_filename() and rejects .lock files. Returns (path, name) tuple.
|
glob_targets |
(targets, glob=True, recursive=True) |
Expands a list of target patterns using Python's glob.iglob(). Raises DvcException if no matches are found.
|
Error Handling
| Function | Signature | Description |
|---|---|---|
error_handler |
(func) |
Decorator that wraps a function to catch all exceptions and pass them to an optional onerror callback from kwargs. Returns a dict with "data" key on success.
|
errored_revisions |
(rev_data: dict) -> list |
Scans revision data for nested "error" keys and returns a list of revision identifiers that contain errors.
|
isatty |
(stream) -> bool |
Safely checks if a stream is a TTY, returning False for None streams.
|
Internal Helpers
def _split(list_to_split, chunk_size):
"""Split a list into chunks of the given size."""
def _visual_width(line):
"""Get the number of columns required to display a string, stripping ANSI codes."""
def _visual_center(line, width):
"""Center-align a string according to its visual width (ANSI-aware)."""
Constants
| Constant | Value | Description |
|---|---|---|
LARGE_DIR_SIZE |
100 |
Threshold for large directory handling |
TARGET_REGEX |
r"(?P<path>.*?)(:(?P<name>[^\\/:]*))??$" |
Regex for parsing path:name target strings
|
Dependencies
| Dependency | Usage |
|---|---|
hashlib |
Content hashing (MD5, SHA256) |
json |
Dict serialization for hashing |
colorama |
ANSI color codes and AnsiToWin32.ANSI_CSI_RE for visual width calculation
|
dvc.dvcfile |
DVC_FILE_SUFFIX, LOCK_FILE, PROJECT_FILE, is_valid_filename
|
dvc.exceptions |
FileExistsLocallyError, DvcException
|
dvc.parsing |
JOIN constant for target parsing
|
dvc.utils.fs |
contains_symlink_up_to for symlink detection
|
dvc.utils.collections |
nested_contains for error scanning
|
Related
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment