Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Iterative Dvc Testing Path Info

From Leeroopedia


Domains

Path_Abstraction, Testing

Overview

Concrete tool for representing and comparing local filesystem paths and URLs in DVC testing infrastructure. The module dvc/testing/path_info.py provides PathInfo (extending pathlib.PurePath) for local paths and URLInfo for URL-based paths, both supporting overlap detection, containment checking, and relative path computation. Specialized variants handle cloud storage paths, HTTP URLs with query parameters, and WebDAV URLs.

Description

PathInfo (local paths):

PathInfo extends pathlib.PurePath and the _BasePath mixin. On construction, it dispatches to WindowsPathInfo or PosixPathInfo based on the OS (for Python < 3.12). Key behaviors:

  • __str__() returns a relative path (via dvc.utils.relpath) rather than the absolute path.
  • __fspath__() returns the original absolute path string, preserving compatibility with os.fspath().
  • isin(other) checks whether this path is strictly inside another path using casefolded path parts.
  • overlaps(other) returns True if either path contains or equals the other.
  • relative_to(other) falls back to os.path.relpath when pathlib's strict ancestor check would raise.
  • The scheme attribute is always "local".

URLInfo (URL paths):

URLInfo represents scheme-based URLs with components: scheme, host, user, port, and a POSIX path. Key behaviors:

  • Construction from a URL string via urlparse, or from parts via the from_parts() classmethod.
  • Path operations (/ operator, joinpath, parent, parents) produce new URLInfo instances.
  • isin(other) checks base parts equality and delegates path containment to the internal _URLPathInfo.
  • netloc is computed from host, user, and port (excluding default ports).
  • bucket is an alias for netloc.

Specialized variants:

Class Inherits From Behavior
CloudURLInfo URLInfo Strips leading slash from path property, suitable for S3/GCS-style bucket paths.
HTTPURLInfo URLInfo Preserves params, query, and fragment from the original URL. The url property reconstructs the full URL including these components. Equality comparison includes the extra parts.
WebDAVURLInfo URLInfo Replaces webdav scheme with http in the reconstructed URL.
WindowsPathInfo PathInfo, PureWindowsPath Windows-specific path handling.
PosixPathInfo PathInfo, PurePosixPath POSIX-specific path handling.

Internal helper classes:

  • _BasePath -- mixin providing overlaps() and isin_or_eq() methods.
  • _URLPathInfo -- extends PosixPathInfo with a __str__ that returns the absolute path (not relative).
  • _URLPathParents -- lazy parent iterator for URLInfo, returning new URLInfo instances for each parent level.

Signature

class PathInfo(pathlib.PurePath, _BasePath):
    scheme: str = "local"

    def as_posix(self) -> str: ...
    def __str__(self) -> str: ...
    def __fspath__(self) -> str: ...
    def isin(self, other) -> bool: ...
    def overlaps(self, other) -> bool: ...          # from _BasePath
    def isin_or_eq(self, other) -> bool: ...        # from _BasePath
    def relative_to(self, other, *args, **kwargs): ...
    def relpath(self, other): ...
    @property
    def fspath(self) -> str: ...


class URLInfo(_BasePath):
    DEFAULT_PORTS: ClassVar[dict[str, int]]

    def __init__(self, url: str): ...
    @classmethod
    def from_parts(cls, scheme=None, host=None, user=None,
                   port=None, path="", netloc=None): ...
    def replace(self, path=None) -> "URLInfo": ...
    def isin(self, other) -> bool: ...
    def relative_to(self, other): ...
    def joinpath(self, *args) -> "URLInfo": ...
    @property
    def url(self) -> str: ...
    @property
    def path(self) -> str: ...
    @property
    def name(self) -> str: ...
    @property
    def netloc(self) -> str: ...
    @property
    def bucket(self) -> str: ...
    @property
    def parent(self) -> "URLInfo": ...
    @property
    def parents(self) -> "_URLPathParents": ...


class CloudURLInfo(URLInfo):
    @property
    def path(self) -> str: ...   # strips leading slash


class HTTPURLInfo(URLInfo):
    params: str
    query: str
    fragment: str
    @classmethod
    def from_parts(cls, ..., params=None, query=None, fragment=None): ...


class WebDAVURLInfo(URLInfo):
    @property
    def url(self) -> str: ...    # replaces webdav:// with http://

Import

from dvc.testing.path_info import PathInfo, URLInfo
from dvc.testing.path_info import CloudURLInfo, HTTPURLInfo, WebDAVURLInfo

Example

from dvc.testing.path_info import PathInfo, URLInfo, CloudURLInfo

# Local path operations
p = PathInfo("/home/user/project/data/train.csv")
parent = PathInfo("/home/user/project")
print(p.isin(parent))          # True
print(parent.isin(p))          # False
print(p.overlaps(parent))      # True

# URL path operations
url = URLInfo("ssh://user@host:2222/repo/data")
print(url.scheme)              # "ssh"
print(url.host)                # "host"
print(url.port)                # 2222
print(url.path)                # "/repo/data"

child = url / "subdir" / "file.txt"
print(child.path)              # "/repo/data/subdir/file.txt"
print(child.isin(url))         # True

# Cloud URL strips leading slash
cloud = CloudURLInfo("s3://bucket/path/to/data")
print(cloud.path)              # "path/to/data"
print(cloud.bucket)            # "bucket"

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment