Implementation:Mage ai Mage ai Dremio Source
| Knowledge Sources | |
|---|---|
| Domains | Data_Integration, Dremio, Source_Connector, SQL_Based |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for extracting data from Dremio data lakehouse provided by the Mage integrations source connector framework.
Description
The Dremio source connector extends the SQL-based Source class (mage_integrations.sources.sql.base.Source) to implement data extraction from Dremio. It connects through a DremioConnection wrapper that uses Apache Arrow Flight for high-performance data transfer. Discovery queries Dremio's information_schema.columns to enumerate tables and their column metadata (name, data type, nullability) within a configured schema. The connector maps SQL data types to Singer schema types: boolean, integer, numeric/decimal/float types, datetime/timestamp types (mapped to string with datetime format), JSON/variant types (mapped to object), UUID types, and all other types default to string. It supports multiple source backends (PostgreSQL, MySQL, MSSQL) with dedicated column type mapping functions for each. The test_connection() method executes a simple SELECT 1 query via the Flight client. Table names in queries are prefixed with the configured schema name. Bytes values in column metadata are automatically decoded to UTF-8. The replication method is full-table.
Usage
Use this source connector when building a Mage data pipeline that needs to extract data from Dremio. Configure with Dremio connection parameters and a schema to specify which schema's tables to discover. Optionally set source_backend to postgresql, mysql, or mssql to use backend-specific column type mappings.
Code Reference
Source Location
- Repository: mage-ai
- File: mage_integrations/mage_integrations/sources/dremio/__init__.py
- Lines: 1-174
Signature
class Dremio(Source):
def test_connection(self):
...
def build_connection(self) -> DremioConnection:
...
def build_discover_query(self, streams: List[str] = None):
...
def column_type_mapping(self, column_type: str, column_format: str = None) -> str:
...
def discover(self, streams: List[str] = None) -> Catalog:
...
def build_table_name(self, stream) -> str:
...
Import
from mage_integrations.sources.dremio import Dremio
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | dict | Yes | Configuration dictionary with Dremio connection settings (passed directly to DremioConnection) |
| catalog | Catalog | No | Singer catalog specifying streams to extract |
| state | dict | No | Previous sync state for incremental extraction |
Configuration Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| schema | str | Yes | Dremio schema name to discover tables from |
| source_backend | str | No | Backend database type for column type mapping: postgresql, mysql, or mssql |
| (additional) | various | Yes | All other config keys are passed directly to DremioConnection (host, port, username, password, etc.) |
Outputs
| Name | Type | Description |
|---|---|---|
| catalog | Catalog | Discovered tables with schemas from information_schema.columns (from discover()) |
| records | Generator[List[Dict]] | Batches of extracted records via Arrow Flight queries (inherited from SQL base Source) |
Usage Examples
from mage_integrations.sources.dremio import Dremio
config = {
"host": "dremio.example.com",
"port": 32010,
"username": "admin",
"password": "password123",
"schema": "my_schema",
"source_backend": "postgresql",
}
source = Dremio(config=config)
# Discover available streams
catalog = source.discover()
# Test connection
source.test_connection()
Related Pages
Implements Principle
- Principle:Mage_ai_Mage_ai_Source_Lifecycle_Orchestration
- Principle:Mage_ai_Mage_ai_SQL_Schema_Discovery