Heuristic:Datahub project Datahub Venv Copies Mode
| Knowledge Sources | |
|---|---|
| Domains | Development, Python |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Set DATAHUB_VENV_USE_COPIES=true when developing on Nix, immutable filesystems, Windows, or container environments where Python virtual environment symlinks fail.
Description
By default, Python virtual environments (venv) use symlinks to the system Python binary. This fails on certain filesystems and platforms: Nix (immutable store), some container environments, Windows (limited symlink support), and read-only mounted volumes. The DataHub build system supports a DATAHUB_VENV_USE_COPIES environment variable that switches venv creation to use --copies mode, copying the Python binary instead of symlinking it.
Usage
Apply this heuristic when virtual environment creation fails with symlink-related errors, or proactively on Nix/NixOS systems and Windows development environments.
The Insight (Rule of Thumb)
- Action: Set
export DATAHUB_VENV_USE_COPIES=truebefore running./gradlew :metadata-ingestion:installDev - Value: Boolean flag (any truthy value works)
- Trade-off: Increases disk usage (full Python binary copied) and slightly slower venv creation, but eliminates symlink failures.
Reasoning
The Python venv module defaults to symlinks for efficiency, but this creates a hard dependency on the symlink capability of the filesystem. Nix stores are immutable by design, making symlinks into them impossible from writable directories. The --copies flag is the official Python workaround for this scenario. The DataHub Gradle build reads the DATAHUB_VENV_USE_COPIES environment variable and passes the appropriate flag during venv creation.