Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Sdv dev SDV Multi Table Data Sampling

From Leeroopedia
Knowledge Sources
Domains Synthetic_Data, Relational_Data, Data_Generation
Last Updated 2026-02-14 00:00 GMT

Overview

A hierarchical sampling process that generates synthetic relational data while preserving referential integrity and inter-table statistical relationships.

Description

Multi-table data sampling generates synthetic data for all tables in a relational dataset simultaneously. Using the hierarchical sampler, parent tables are sampled first, then child tables are conditioned on their parent rows. The scale parameter controls the output size relative to the original data. Referential integrity (foreign key relationships) is maintained by construction.

Usage

Call sample on a fitted HMASynthesizer. The scale parameter controls the output size: 1.0 produces data with similar row counts to the original, values > 1.0 produce more rows, and values < 1.0 produce fewer rows.

Theoretical Basis

  1. Root sampling: Sample rows for root tables (tables with no parents)
  2. Hierarchical descent: For each parent row, determine child row count from extension columns
  3. Child generation: Sample child rows conditioned on parent row extension values
  4. Foreign key assignment: Assign the parent's primary key as the child's foreign key
  5. Cleanup: Remove extension columns from all tables

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment