Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Mage ai Mage ai API Stream Discovery

From Leeroopedia
Revision as of 17:55, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Mage_ai_Mage_ai_API_Stream_Discovery.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Data_Integration, API, Schema_Management
Last Updated 2026-02-09 00:00 GMT

Overview

A file-based schema discovery mechanism that loads JSON Schema definitions from a local schemas directory to build a Singer catalog for API source connectors.

Description

API Stream Discovery provides catalog generation for non-SQL sources where schemas cannot be introspected from a database. Instead, each stream's schema is defined as a JSON Schema Draft 4 file in a schemas/ directory within the connector package. The discovery process scans this directory, parses each JSON file into a Singer Schema object, and builds CatalogEntry instances with metadata including key_properties, replication_method, and valid_replication_keys derived from the connector's method overrides.

Usage

Use this principle for API-based, file-based, or any non-SQL source connectors. Schemas are defined statically (committed to the repo) rather than discovered dynamically. Some connectors may override discover() to add dynamic schema discovery on top of static schemas.

Theoretical Basis

The file-based discovery algorithm:

  1. Locate schemas/ directory relative to the connector's __init__.py
  2. For each .json file in the directory, load as JSON Schema
  3. For each schema, call build_catalog_entry() which:
    • Queries get_table_key_properties(stream_id) for primary keys
    • Queries get_forced_replication_method(stream_id) for replication strategy
    • Queries get_valid_replication_keys(stream_id) for bookmark columns
    • Generates standard Singer metadata via get_standard_metadata()
  4. Return Catalog containing all CatalogEntry objects

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment