Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Sdv dev SDV Metadata Detection

From Leeroopedia
Knowledge Sources
Domains Data_Science, Schema_Inference
Last Updated 2026-02-14 00:00 GMT

Overview

An automated schema inference mechanism that detects column types, primary keys, and foreign key relationships from raw DataFrames.

Description

Metadata detection automates the process of defining a data schema by analyzing the contents of one or more DataFrames. Instead of manually specifying each column's semantic data type (sdtype), primary keys, and inter-table relationships, the detection algorithm inspects column values to infer whether they are numerical, categorical, datetime, boolean, or ID columns.

For multi-table scenarios, the detector also identifies foreign key relationships between tables using column name matching heuristics. This produces a complete Metadata object that describes the full relational structure of the dataset.

Usage

Use metadata detection when working with new datasets where the schema is unknown or when prototyping quickly. It is the recommended approach for initial exploration before manually refining column types. For production use, detected metadata should be reviewed and saved to JSON for reproducibility.

Theoretical Basis

Schema inference applies heuristic classification rules to column data:

  1. Type inference: Analyze value distributions to classify columns as numerical (continuous values), categorical (limited unique values), datetime (parseable timestamps), boolean (two-value), or ID (unique per row)
  2. Key detection: Columns with all unique values are candidates for primary keys
  3. Relationship detection: Foreign keys are inferred by matching column names across tables (column_name_match algorithm)

The inference follows a priority ordering: ID > datetime > numerical > categorical > boolean, with fallback to unknown for ambiguous columns.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment