Principle:Mage ai Mage ai SQL Schema Discovery
| Knowledge Sources | |
|---|---|
| Domains | Data_Integration, SQL, Schema_Management |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
An automated schema introspection mechanism that queries database information_schema to discover table structures, column types, and constraints for Singer catalog generation.
Description
SQL Schema Discovery leverages the SQL standard information_schema.columns view to automatically detect available tables, their columns, data types, nullability, and key constraints. This eliminates manual schema definition for database sources by programmatically building Singer Catalog entries with proper JSON Schema type mappings (SQL types to JSON Schema types like boolean, integer, number, string, object) and metadata (key properties, replication method, valid replication keys).
Usage
Use this principle when building source connectors for SQL databases (PostgreSQL, MySQL, BigQuery, Snowflake, Redshift, etc.). It should execute in discover mode (-d flag) before the first sync to generate the catalog that drives data extraction. Skip this for API sources that define schemas via static JSON files.
Theoretical Basis
The discovery algorithm follows these steps:
- Query information_schema.columns filtered by table_schema
- Group results by TABLE_NAME
- For each table, map column DATA_TYPE to JSON Schema types:
- boolean → "boolean"
- int/bigint → "integer"
- float/double/numeric/decimal → "number"
- datetime/timestamp/date → "string" with format "date-time"
- json/variant → "object"
- uuid → "string" with format "uuid"
- all others → "string"
- Extract COLUMN_KEY for unique constraints (PRI, UNIQUE)
- Build CatalogEntry with Schema, metadata, key_properties, replication_method