Implementation:Iterative Dvc Testing Path Info
Domains
Overview
Concrete tool for representing and comparing local filesystem paths and URLs in DVC testing infrastructure. The module dvc/testing/path_info.py provides PathInfo (extending pathlib.PurePath) for local paths and URLInfo for URL-based paths, both supporting overlap detection, containment checking, and relative path computation. Specialized variants handle cloud storage paths, HTTP URLs with query parameters, and WebDAV URLs.
Description
PathInfo (local paths):
PathInfo extends pathlib.PurePath and the _BasePath mixin. On construction, it dispatches to WindowsPathInfo or PosixPathInfo based on the OS (for Python < 3.12). Key behaviors:
__str__()returns a relative path (viadvc.utils.relpath) rather than the absolute path.__fspath__()returns the original absolute path string, preserving compatibility withos.fspath().isin(other)checks whether this path is strictly inside another path using casefolded path parts.overlaps(other)returns True if either path contains or equals the other.relative_to(other)falls back toos.path.relpathwhenpathlib's strict ancestor check would raise.- The
schemeattribute is always"local".
URLInfo (URL paths):
URLInfo represents scheme-based URLs with components: scheme, host, user, port, and a POSIX path. Key behaviors:
- Construction from a URL string via
urlparse, or from parts via thefrom_parts()classmethod. - Path operations (
/operator,joinpath,parent,parents) produce newURLInfoinstances. isin(other)checks base parts equality and delegates path containment to the internal_URLPathInfo.netlocis computed from host, user, and port (excluding default ports).bucketis an alias fornetloc.
Specialized variants:
| Class | Inherits From | Behavior |
|---|---|---|
CloudURLInfo |
URLInfo |
Strips leading slash from path property, suitable for S3/GCS-style bucket paths.
|
HTTPURLInfo |
URLInfo |
Preserves params, query, and fragment from the original URL. The url property reconstructs the full URL including these components. Equality comparison includes the extra parts.
|
WebDAVURLInfo |
URLInfo |
Replaces webdav scheme with http in the reconstructed URL.
|
WindowsPathInfo |
PathInfo, PureWindowsPath |
Windows-specific path handling. |
PosixPathInfo |
PathInfo, PurePosixPath |
POSIX-specific path handling. |
Internal helper classes:
_BasePath-- mixin providingoverlaps()andisin_or_eq()methods._URLPathInfo-- extendsPosixPathInfowith a__str__that returns the absolute path (not relative)._URLPathParents-- lazy parent iterator forURLInfo, returning newURLInfoinstances for each parent level.
Signature
class PathInfo(pathlib.PurePath, _BasePath):
scheme: str = "local"
def as_posix(self) -> str: ...
def __str__(self) -> str: ...
def __fspath__(self) -> str: ...
def isin(self, other) -> bool: ...
def overlaps(self, other) -> bool: ... # from _BasePath
def isin_or_eq(self, other) -> bool: ... # from _BasePath
def relative_to(self, other, *args, **kwargs): ...
def relpath(self, other): ...
@property
def fspath(self) -> str: ...
class URLInfo(_BasePath):
DEFAULT_PORTS: ClassVar[dict[str, int]]
def __init__(self, url: str): ...
@classmethod
def from_parts(cls, scheme=None, host=None, user=None,
port=None, path="", netloc=None): ...
def replace(self, path=None) -> "URLInfo": ...
def isin(self, other) -> bool: ...
def relative_to(self, other): ...
def joinpath(self, *args) -> "URLInfo": ...
@property
def url(self) -> str: ...
@property
def path(self) -> str: ...
@property
def name(self) -> str: ...
@property
def netloc(self) -> str: ...
@property
def bucket(self) -> str: ...
@property
def parent(self) -> "URLInfo": ...
@property
def parents(self) -> "_URLPathParents": ...
class CloudURLInfo(URLInfo):
@property
def path(self) -> str: ... # strips leading slash
class HTTPURLInfo(URLInfo):
params: str
query: str
fragment: str
@classmethod
def from_parts(cls, ..., params=None, query=None, fragment=None): ...
class WebDAVURLInfo(URLInfo):
@property
def url(self) -> str: ... # replaces webdav:// with http://
Import
from dvc.testing.path_info import PathInfo, URLInfo
from dvc.testing.path_info import CloudURLInfo, HTTPURLInfo, WebDAVURLInfo
Example
from dvc.testing.path_info import PathInfo, URLInfo, CloudURLInfo
# Local path operations
p = PathInfo("/home/user/project/data/train.csv")
parent = PathInfo("/home/user/project")
print(p.isin(parent)) # True
print(parent.isin(p)) # False
print(p.overlaps(parent)) # True
# URL path operations
url = URLInfo("ssh://user@host:2222/repo/data")
print(url.scheme) # "ssh"
print(url.host) # "host"
print(url.port) # 2222
print(url.path) # "/repo/data"
child = url / "subdir" / "file.txt"
print(child.path) # "/repo/data/subdir/file.txt"
print(child.isin(url)) # True
# Cloud URL strips leading slash
cloud = CloudURLInfo("s3://bucket/path/to/data")
print(cloud.path) # "path/to/data"
print(cloud.bucket) # "bucket"