Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Sdv dev SDV Metadata Detection

From Leeroopedia
Revision as of 18:01, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Sdv_dev_SDV_Metadata_Detection.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Data_Science, Schema_Inference
Last Updated 2026-02-14 00:00 GMT

Overview

An automated schema inference mechanism that detects column types, primary keys, and foreign key relationships from raw DataFrames.

Description

Metadata detection automates the process of defining a data schema by analyzing the contents of one or more DataFrames. Instead of manually specifying each column's semantic data type (sdtype), primary keys, and inter-table relationships, the detection algorithm inspects column values to infer whether they are numerical, categorical, datetime, boolean, or ID columns.

For multi-table scenarios, the detector also identifies foreign key relationships between tables using column name matching heuristics. This produces a complete Metadata object that describes the full relational structure of the dataset.

Usage

Use metadata detection when working with new datasets where the schema is unknown or when prototyping quickly. It is the recommended approach for initial exploration before manually refining column types. For production use, detected metadata should be reviewed and saved to JSON for reproducibility.

Theoretical Basis

Schema inference applies heuristic classification rules to column data:

  1. Type inference: Analyze value distributions to classify columns as numerical (continuous values), categorical (limited unique values), datetime (parseable timestamps), boolean (two-value), or ID (unique per row)
  2. Key detection: Columns with all unique values are candidates for primary keys
  3. Relationship detection: Foreign keys are inferred by matching column names across tables (column_name_match algorithm)

The inference follows a priority ordering: ID > datetime > numerical > categorical > boolean, with fallback to unknown for ambiguous columns.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment