Principle:Sdv dev SDV Metadata Detection

Knowledge Sources	SDV Documentation SDV
Domains	Data_Science, Schema_Inference
Last Updated	2026-02-14 00:00 GMT

Overview

An automated schema inference mechanism that detects column types, primary keys, and foreign key relationships from raw DataFrames.

Description

Metadata detection automates the process of defining a data schema by analyzing the contents of one or more DataFrames. Instead of manually specifying each column's semantic data type (sdtype), primary keys, and inter-table relationships, the detection algorithm inspects column values to infer whether they are numerical, categorical, datetime, boolean, or ID columns.

For multi-table scenarios, the detector also identifies foreign key relationships between tables using column name matching heuristics. This produces a complete Metadata object that describes the full relational structure of the dataset.

Usage

Use metadata detection when working with new datasets where the schema is unknown or when prototyping quickly. It is the recommended approach for initial exploration before manually refining column types. For production use, detected metadata should be reviewed and saved to JSON for reproducibility.

Theoretical Basis

Schema inference applies heuristic classification rules to column data:

Type inference: Analyze value distributions to classify columns as numerical (continuous values), categorical (limited unique values), datetime (parseable timestamps), boolean (two-value), or ID (unique per row)
Key detection: Columns with all unique values are candidates for primary keys
Relationship detection: Foreign keys are inferred by matching column names across tables (column_name_match algorithm)

The inference follows a priority ordering: ID > datetime > numerical > categorical > boolean, with fallback to unknown for ambiguous columns.

Related Pages

Implemented By

Implementation:Sdv_dev_SDV_Metadata_Detect_From_Dataframes

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment