Principle:Mage ai Mage ai Catalog Selection
| Knowledge Sources | |
|---|---|
| Domains | Data_Integration, ETL |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A catalog configuration mechanism that marks streams and columns for inclusion/exclusion in a Singer sync operation using metadata-driven selection flags.
Description
Catalog Selection bridges the gap between schema discovery and data extraction. After a tap discovers all available streams, this principle allows users to select which streams to sync, which columns to include, what replication method to use (FULL_TABLE or INCREMENTAL), and which columns serve as primary keys or bookmark properties. Selection is encoded in Singer metadata entries using the "selected" and "inclusion" fields (automatic, available, unsupported).
Usage
Apply this principle after schema discovery and before starting a sync. It is essential whenever users need to customize which data is extracted rather than syncing everything discovered.
Theoretical Basis
Singer metadata uses a breadcrumb-based system:
- Stream-level metadata (breadcrumb=[]) contains the "selected" flag for the entire stream
- Column-level metadata (breadcrumb=["properties", "column_name"]) contains per-column selection
- Inclusion rules:
- "automatic" - column is always included (key properties, replication keys)
- "available" - column can be selected or deselected
- "unsupported" - column cannot be synced