Principle:Mage ai Mage ai API Stream Discovery
| Knowledge Sources | |
|---|---|
| Domains | Data_Integration, API, Schema_Management |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A file-based schema discovery mechanism that loads JSON Schema definitions from a local schemas directory to build a Singer catalog for API source connectors.
Description
API Stream Discovery provides catalog generation for non-SQL sources where schemas cannot be introspected from a database. Instead, each stream's schema is defined as a JSON Schema Draft 4 file in a schemas/ directory within the connector package. The discovery process scans this directory, parses each JSON file into a Singer Schema object, and builds CatalogEntry instances with metadata including key_properties, replication_method, and valid_replication_keys derived from the connector's method overrides.
Usage
Use this principle for API-based, file-based, or any non-SQL source connectors. Schemas are defined statically (committed to the repo) rather than discovered dynamically. Some connectors may override discover() to add dynamic schema discovery on top of static schemas.
Theoretical Basis
The file-based discovery algorithm:
- Locate schemas/ directory relative to the connector's __init__.py
- For each .json file in the directory, load as JSON Schema
- For each schema, call build_catalog_entry() which:
- Queries get_table_key_properties(stream_id) for primary keys
- Queries get_forced_replication_method(stream_id) for replication strategy
- Queries get_valid_replication_keys(stream_id) for bookmark columns
- Generates standard Singer metadata via get_standard_metadata()
- Return Catalog containing all CatalogEntry objects