Implementation:Iterative Dvc Repo Ls Url
| Knowledge Sources | |
|---|---|
| Domains | Data_Access, Remote_Storage |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
Repo_Ls_Url provides a lightweight function for listing files at an external URL without requiring a DVC repository. It is implemented in dvc/repo/ls_url.py (41 lines) and exposes a single public function ls_url().
from dvc.repo.ls_url import ls_url
Unlike dvc.repo.ls, this module operates directly on remote or local filesystems parsed from a URL, with no DVC metadata overlay. It is the backend for the dvc ls-url CLI command.
Public Function
ls_url()
Lists files at a given URL, supporting local paths and remote storage protocols (S3, GCS, Azure, SSH, HTTP, etc.).
Signature:
def ls_url(
url: str,
*,
fs_config: Optional[dict] = None,
recursive: bool = False,
maxdepth: Optional[int] = None,
config: Optional[dict] = None,
) -> list[dict[str, Any]]:
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str |
required | The URL to list (local path or remote protocol URL) |
fs_config |
Optional[dict] |
None |
Filesystem-specific configuration (credentials, etc.) |
recursive |
bool |
False |
Recursively list files in subdirectories |
maxdepth |
Optional[int] |
None |
Maximum recursion depth |
config |
Optional[dict] |
None |
DVC config for remote resolution |
Return value: A list of dictionaries, each with the structure:
{
"path": str, # path relative to the listed URL
"isdir": bool, # whether the entry is a directory
"size": int, # file size (or None)
}
Exceptions:
dvc.exceptions.URLMissingError-- raised when the URL does not exist (wrapsFileNotFoundError)
Execution Flow
parse_external_url()is called to resolve the URL into a filesystem instance (fs) and filesystem path (fs_path).fs.info()is called on the path. If the path does not exist,URLMissingErroris raised.- If
maxdepth == 0or the path is a file (not a directory), a single-entry list is returned immediately. - Otherwise, the function walks the directory using
fs.walk()(or_LocalFileSystem().walk()for local paths, since DVC'sLocalFileSystemdoes not supportmaxdepth). - For each level, files are collected. When not recursive or when
maxdepthhas been reached, directory entries are merged into the file listing. - If not recursive, the walk breaks after the first iteration (top-level only).
- Entries are returned as dictionaries with
path(relative to the listed URL),isdir, andsize.
Key Design Decisions
- No DVC metadata: Unlike
ls(), this function does not overlay DVC tracking information. The output only contains filesystem-level attributes (path,isdir,size). - LocalFileSystem workaround: DVC's own
LocalFileSystemdoes not support themaxdepthparameter inwalk(), so the function falls back tofsspec's nativeLocalFileSystemimplementation when operating on local paths. - Consistent walk logic: The directory/file merging logic at
maxdepthboundaries mirrors the pattern used indvc.repo.ls._ls(), ensuring consistent behavior across both listing functions.
Dependencies
dvc.fs.parse_external_url-- URL-to-filesystem resolutiondvc.fs.LocalFileSystem-- DVC local filesystem wrapperfsspec.implementations.local.LocalFileSystem-- native fsspec local filesystem (used asmaxdepthworkaround)dvc.exceptions.URLMissingError-- error for missing URLs