Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Heuristic:Datahub project Datahub Venv Copies Mode

From Leeroopedia
Revision as of 10:40, 16 February 2026 by Admin (talk | contribs) (Auto-imported from heuristics/Datahub_project_Datahub_Venv_Copies_Mode.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)





Knowledge Sources
Domains Development, Python
Last Updated 2026-02-09 17:00 GMT

Overview

Set DATAHUB_VENV_USE_COPIES=true when developing on Nix, immutable filesystems, Windows, or container environments where Python virtual environment symlinks fail.

Description

By default, Python virtual environments (venv) use symlinks to the system Python binary. This fails on certain filesystems and platforms: Nix (immutable store), some container environments, Windows (limited symlink support), and read-only mounted volumes. The DataHub build system supports a DATAHUB_VENV_USE_COPIES environment variable that switches venv creation to use --copies mode, copying the Python binary instead of symlinking it.

Usage

Apply this heuristic when virtual environment creation fails with symlink-related errors, or proactively on Nix/NixOS systems and Windows development environments.

The Insight (Rule of Thumb)

  • Action: Set export DATAHUB_VENV_USE_COPIES=true before running ./gradlew :metadata-ingestion:installDev
  • Value: Boolean flag (any truthy value works)
  • Trade-off: Increases disk usage (full Python binary copied) and slightly slower venv creation, but eliminates symlink failures.

Reasoning

The Python venv module defaults to symlinks for efficiency, but this creates a hard dependency on the symlink capability of the filesystem. Nix stores are immutable by design, making symlinks into them impossible from writable directories. The --copies flag is the official Python workaround for this scenario. The DataHub Gradle build reads the DATAHUB_VENV_USE_COPIES environment variable and passes the appropriate flag during venv creation.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment