Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Sdv dev SDV Python Runtime

From Leeroopedia
Knowledge Sources
Domains Synthetic_Data, Infrastructure
Last Updated 2026-02-14 19:00 GMT

Overview

Python 3.9–3.14 environment with pandas, numpy, copulas, ctgan, deepecho, rdt, sdmetrics, and supporting libraries for synthetic data generation.

Description

This environment provides the full runtime context for the SDV (Synthetic Data Vault) library. It is a CPU-based Python environment by default, with optional GPU acceleration for GAN-based synthesizers (CTGAN, CopulaGAN). The dependency matrix is Python-version-aware: different minimum versions of numpy, pandas, copulas, ctgan, deepecho, rdt, and sdmetrics are required depending on the Python interpreter version. System-level packages (graphviz, pandoc) are needed for metadata visualization and documentation generation.

Usage

Use this environment for all SDV workflows: single-table synthesis, multi-table synthesis, sequential data synthesis, constrained synthesis, and data quality evaluation. Every Implementation page in this wiki requires this environment as the base runtime.

System Requirements

Category Requirement Notes
OS Linux, macOS, Windows Cross-platform; Linux recommended for production
Python >= 3.9, < 3.15 Supports 3.9, 3.10, 3.11, 3.12, 3.13, 3.14
Disk 500MB+ For package installation and model caching

Dependencies

System Packages

  • `graphviz` — Required for metadata visualization (graph rendering)
  • `pandoc` — Required for documentation generation

Python Packages (Core)

  • `boto3` >= 1.28, < 2.0.0
  • `botocore` >= 1.31, < 2.0.0
  • `cloudpickle` >= 2.1.0 (Python < 3.14) or >= 3.1.1 (Python >= 3.14)
  • `graphviz` >= 0.13.2
  • `numpy` >= 1.22.2 (Python 3.9) / >= 1.24.0 (3.10–3.11) / >= 1.26.0 (3.12) / >= 2.1.0 (3.13) / >= 2.3.2 (3.14)
  • `pandas` >= 1.4.0 (Python < 3.11) / >= 1.5.0 (3.11) / >= 2.1.1 (3.12) / >= 2.2.3 (3.13) / >= 2.3.3 (3.14), < 3
  • `tqdm` >= 4.29
  • `copulas` >= 0.12.1 (Python < 3.14) or >= 0.14.0 (Python >= 3.14)
  • `ctgan` >= 0.11.1 (Python < 3.14) or >= 0.12.0 (Python >= 3.14)
  • `deepecho` >= 0.7.0 (Python < 3.14) or >= 0.8.0 (Python >= 3.14)
  • `rdt` >= 1.18.2 (Python < 3.14) or >= 1.20.0 (Python >= 3.14)
  • `sdmetrics` >= 0.21.0 (Python < 3.14) or >= 0.26.0 (Python >= 3.14)
  • `platformdirs` >= 4.0
  • `pyyaml` >= 6.0.1

Python Packages (Optional)

  • `pomegranate` >= 0.15, < 1 — For Bayesian network distributions
  • `pandas[excel]` — For Excel I/O support

Credentials

No credentials are required for core SDV functionality. However:

  • AWS credentials (via `boto3`): Only needed if loading demo datasets from non-public S3 buckets. The `download_demo` function uses boto3 for S3 access but connects to a public bucket by default.

Quick Install

# Install SDV with all core dependencies
pip install sdv

# Install with Excel support
pip install "sdv[excel]"

# Install with Bayesian network support
pip install "sdv[pomegranate]"

# System packages (Ubuntu/Debian)
sudo apt-get install graphviz pandoc

Code Evidence

Python version constraint from `pyproject.toml:22`:

requires-python = '>=3.9,<3.15'

Python-version-conditional numpy dependency from `pyproject.toml:30-34`:

"numpy>=1.22.2;python_version<'3.10'",
"numpy>=1.24.0;python_version>='3.10' and python_version<'3.12'",
"numpy>=1.26.0;python_version>='3.12' and python_version<'3.13'",
"numpy>=2.1.0;python_version>='3.13' and python_version<'3.14'",
"numpy>=2.3.2;python_version>='3.14'",

Optional CTGAN import handling from `sdv/single_table/ctgan.py:15-23`:

try:
    from ctgan import CTGAN, TVAE
    from ctgan.synthesizers._utils import get_enable_gpu_value
    import_error = None
except ModuleNotFoundError as e:
    CTGAN = None
    TVAE = None
    import_error = e

Optional deepecho import handling from `sdv/sequential/par.py:28-36`:

try:
    from deepecho import PARModel
    from deepecho.sequences import assemble_sequences
    import_error = None
except ModuleNotFoundError as e:
    PARModel = None
    assemble_sequences = None
    import_error = e

Custom ModuleNotFoundError from `sdv/utils/mixins.py:4-12`:

class MissingModuleMixin:
    @classmethod
    def raise_module_not_found_error(cls, error):
        raise ModuleNotFoundError(
            f"{error.msg}. Please install {error.name} in order to use the '{cls.__name__}'."
        )

System packages from `apt.txt:1-3`:

# apt-get requirements for development and mybinder environment
graphviz
pandoc

Common Errors

Error Message Cause Solution
`ModuleNotFoundError: No module named 'ctgan'. Please install ctgan in order to use the 'CTGANSynthesizer'.` ctgan package not installed `pip install ctgan` or `pip install sdv`
`ModuleNotFoundError: No module named 'deepecho'. Please install deepecho in order to use the 'PARSynthesizer'.` deepecho package not installed `pip install deepecho` or `pip install sdv`
`VersionError` when loading a saved synthesizer Current SDV version is older than the version that created the synthesizer Upgrade SDV to the version shown in the error message
`SDVVersionWarning` on load SDV version mismatch between current and fitted versions Retrain synthesizer on current version for latest features

Compatibility Notes

  • Python 3.14: Requires newer versions of all SDV ecosystem packages (copulas >= 0.14.0, ctgan >= 0.12.0, deepecho >= 0.8.0, rdt >= 1.20.0, sdmetrics >= 0.26.0, cloudpickle >= 3.1.1).
  • Python 3.9: Minimum supported version. Uses older numpy (>= 1.22.2) and pandas (>= 1.4.0).
  • Windows: Fully supported but graphviz system package may require manual installation.
  • re module: SDV handles Python version differences in regex internals via fallback import (`re._parser` → `sre_parse`).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment