Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Iterative Dvc Repo Ls Url

From Leeroopedia
Revision as of 15:19, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Iterative_Dvc_Repo_Ls_Url.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Data_Access, Remote_Storage
Last Updated 2026-02-10 10:00 GMT

Overview

Repo_Ls_Url provides a lightweight function for listing files at an external URL without requiring a DVC repository. It is implemented in dvc/repo/ls_url.py (41 lines) and exposes a single public function ls_url().

from dvc.repo.ls_url import ls_url

Unlike dvc.repo.ls, this module operates directly on remote or local filesystems parsed from a URL, with no DVC metadata overlay. It is the backend for the dvc ls-url CLI command.

Public Function

ls_url()

Lists files at a given URL, supporting local paths and remote storage protocols (S3, GCS, Azure, SSH, HTTP, etc.).

Signature:

def ls_url(
    url: str,
    *,
    fs_config: Optional[dict] = None,
    recursive: bool = False,
    maxdepth: Optional[int] = None,
    config: Optional[dict] = None,
) -> list[dict[str, Any]]:

Parameters:

Parameter Type Default Description
url str required The URL to list (local path or remote protocol URL)
fs_config Optional[dict] None Filesystem-specific configuration (credentials, etc.)
recursive bool False Recursively list files in subdirectories
maxdepth Optional[int] None Maximum recursion depth
config Optional[dict] None DVC config for remote resolution

Return value: A list of dictionaries, each with the structure:

{
    "path": str,    # path relative to the listed URL
    "isdir": bool,  # whether the entry is a directory
    "size": int,    # file size (or None)
}

Exceptions:

  • dvc.exceptions.URLMissingError -- raised when the URL does not exist (wraps FileNotFoundError)

Execution Flow

  1. parse_external_url() is called to resolve the URL into a filesystem instance (fs) and filesystem path (fs_path).
  2. fs.info() is called on the path. If the path does not exist, URLMissingError is raised.
  3. If maxdepth == 0 or the path is a file (not a directory), a single-entry list is returned immediately.
  4. Otherwise, the function walks the directory using fs.walk() (or _LocalFileSystem().walk() for local paths, since DVC's LocalFileSystem does not support maxdepth).
  5. For each level, files are collected. When not recursive or when maxdepth has been reached, directory entries are merged into the file listing.
  6. If not recursive, the walk breaks after the first iteration (top-level only).
  7. Entries are returned as dictionaries with path (relative to the listed URL), isdir, and size.

Key Design Decisions

  • No DVC metadata: Unlike ls(), this function does not overlay DVC tracking information. The output only contains filesystem-level attributes (path, isdir, size).
  • LocalFileSystem workaround: DVC's own LocalFileSystem does not support the maxdepth parameter in walk(), so the function falls back to fsspec's native LocalFileSystem implementation when operating on local paths.
  • Consistent walk logic: The directory/file merging logic at maxdepth boundaries mirrors the pattern used in dvc.repo.ls._ls(), ensuring consistent behavior across both listing functions.

Dependencies

  • dvc.fs.parse_external_url -- URL-to-filesystem resolution
  • dvc.fs.LocalFileSystem -- DVC local filesystem wrapper
  • fsspec.implementations.local.LocalFileSystem -- native fsspec local filesystem (used as maxdepth workaround)
  • dvc.exceptions.URLMissingError -- error for missing URLs

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment