Principle:Sdv dev SDV Schema Simplification

Knowledge Sources	SDV Documentation SDV
Domains	Data_Engineering, Synthetic_Data
Last Updated	2026-02-14 00:00 GMT

Overview

A data reduction technique that simplifies complex multi-table schemas by removing distant tables and excess columns to enable faster prototyping with hierarchical synthesizers.

Description

Schema simplification addresses the challenge of working with large, complex relational databases when using HMA synthesis. Complex schemas with many tables and columns can cause the HMA algorithm to create an excessive number of extension columns during table augmentation, leading to slow fitting and poor quality. The simplification process removes tables beyond the grandchild level, strips modelable columns from grandchild tables, reduces columns in child tables, and eliminates relationships not connected to the main root table.

A companion operation, random subsetting, reduces the number of rows while preserving referential integrity.

Usage

Use schema simplification as an optional preprocessing step before HMASynthesizer when the multi-table dataset has a complex schema with many tables or columns. It is particularly useful for proof-of-concept workflows where fast iteration is more important than complete fidelity.

Theoretical Basis

The simplification algorithm operates hierarchically:

Identify root table: Find the table with no parent (or the largest root if multiple exist)
Prune distant tables: Keep only children and grandchildren of the root
Reduce grandchild columns: Remove all modelable columns from grandchild tables (keep only keys)
Reduce child columns: Keep a subset of modelable columns in child tables
Update metadata: Remove pruned relationships and columns from metadata
Estimate column count: Only simplify if estimated extension columns exceed the threshold (1000)

Related Pages

Implemented By

Implementation:Sdv_dev_SDV_Simplify_Schema

Uses Heuristic

Heuristic:Sdv_dev_SDV_HMA_Schema_Simplification

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment