Principle:Sdv dev SDV Single Table Model Fitting
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Synthetic_Data |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
A data preprocessing and model training pipeline that transforms raw tabular data and fits a statistical or neural model to learn its distributions.
Description
Single-table model fitting is the core training step in any SDV synthesis workflow. It takes a raw DataFrame, applies a series of preprocessing transformations (type conversion, anonymization, numerical formatting via the DataProcessor and HyperTransformer), and then fits the underlying statistical or neural model on the transformed data.
The fitting pipeline handles constraint transformations (if any constraints are registered), column type conversions, missing value imputation, and produces a fitted model that can subsequently generate synthetic data.
Usage
Use model fitting after initializing a synthesizer and before sampling. The fit method must be called with the complete real dataset. After fitting, the synthesizer is ready to generate synthetic data via the sample method.
Theoretical Basis
The fitting pipeline follows these stages:
- Preprocessing: Raw data passes through DataProcessor, which applies HyperTransformer to convert columns to model-compatible formats
- Constraint transformation: If constraints are registered, data is transformed to satisfy constraint-aware representations
- Model fitting: The preprocessed data is passed to the underlying model's _fit method (e.g., GaussianMultivariate.fit or CTGAN.fit)
- State update: The synthesizer marks itself as fitted and records metadata about the fit operation