Workflow:Sdv dev SDV Sequential data synthesis

Knowledge Sources	SDV SDV Documentation DeepEcho
Domains	Synthetic_Data, Sequential_Data, Timeseries, Deep_Learning
Last Updated	2026-02-14 19:00 GMT

Overview

End-to-end process for generating synthetic sequential (timeseries) data using the PARSynthesizer, preserving temporal patterns, sequence structures, and context column relationships.

Description

This workflow covers the generation of synthetic sequential data where rows are ordered in time and grouped by entity. The PARSynthesizer (Probabilistic AutoRegressive) uses a deep learning autoregressive model from the DeepEcho library to learn temporal dependencies within sequences. It separates columns into context columns (attributes that remain constant within a sequence, such as entity ID or category) and non-context columns (values that change over time). Context columns are modeled by a separate GaussianCopulaSynthesizer, and the sequential columns are generated conditioned on these context values.

Usage

Execute this workflow when you have timeseries or sequential data where rows are grouped by a sequence key (e.g., patient ID, device ID) and ordered by time, and you need to generate synthetic sequences that preserve temporal patterns, trends, and entity-level context. Common applications include IoT sensor data, patient health records over time, financial transaction sequences, and stock price histories.

Execution Steps

Step 1: Load sequential data

Obtain the real sequential dataset as a pandas DataFrame. The SDV demo downloader supports sequential modality for example timeseries datasets. The data must contain a sequence key column that identifies which entity each row belongs to.

Key considerations:

The DataFrame must be sorted by sequence key and time
Each sequence (group of rows sharing the same key) represents one entity's timeline
The SDV demo provides example sequential datasets with pre-built metadata

Step 2: Define metadata with sequence key

Create a Metadata object that describes the table schema including the sequence key column. The sequence key identifies which rows belong to the same sequence (entity). Additionally, identify context columns that remain constant within each sequence.

Key considerations:

The sequence key column must be marked in the metadata as an ID column
Context columns are values that do not change within a single sequence
Non-context columns are the time-varying values the model will learn to generate autoregressively
Validate that every sequence has consistent context column values

Step 3: Initialize PARSynthesizer

Instantiate PARSynthesizer with the metadata, the sequence key, and optionally the list of context columns. Configure training parameters such as epochs, segment size, sample size, and CUDA usage.

Key considerations:

context_columns specifies which columns are constant per sequence
epochs controls training duration (default 128)
segment_size can split long sequences into shorter training segments
sample_size controls how many candidate samples are generated before selecting the best one
cuda enables GPU acceleration if available
Requires the DeepEcho library to be installed

Step 4: Fit on sequential data

Call fit with the DataFrame. Internally, the PARSynthesizer separates context and non-context columns, fits a GaussianCopulaSynthesizer on the context data, and trains the DeepEcho PARModel on the sequential data. Numerical columns are differenced and formatted before training.

Key considerations:

Context columns are extracted per unique sequence key and modeled independently
Non-context columns are assembled into sequences and fed to the autoregressive model
The model learns conditional distributions for each time step given previous steps
Training loss values can be retrieved after fitting for monitoring

Step 5: Sample synthetic sequences

Generate new synthetic sequences by calling sample with the desired number of sequences. The sampler first generates context values from the context synthesizer, then produces sequential data conditioned on these contexts.

Key considerations:

num_sequences controls how many entity timelines to generate
sequence_length can fix the length of generated sequences
Each synthetic sequence gets a new unique sequence key
Context column values in each sequence are internally consistent
Conditional sampling is supported to fix specific context values

Step 6: Evaluate sequential data quality

Assess the quality of synthetic sequential data using the single-table evaluation functions applied to the flattened output. Compare temporal distributions, context value distributions, and sequence length characteristics.

Key considerations:

Standard quality reports can evaluate column-level distributions
Sequence-specific quality (temporal autocorrelation, trend preservation) may require custom analysis
Compare the distribution of sequence lengths between real and synthetic data

Execution Diagram

GitHub URL

Workflow Repository