Principle:Apache Druid SQL Input Source Selection
| Knowledge Sources | |
|---|---|
| Domains | Data_Ingestion, SQL_Ingestion |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
A source selection principle for SQL-based ingestion that configures the external data source as an EXTERN function input to INSERT/REPLACE SQL statements.
Description
SQL Input Source Selection is the first step in the SQL-based data ingestion workflow (MSQ engine). Unlike the classic batch wizard which builds a JSON ingestion spec, the SQL workflow constructs an INSERT INTO or REPLACE INTO statement with an EXTERN() table function as the data source.
The InputSourceStep component presents the same source types as the classic wizard (S3, Azure, Google Cloud, HTTP, local, inline) but outputs an InputSource object that will be converted into an EXTERN() SQL function call. It also auto-detects the input format and optionally suggests a PARTITIONED BY hint based on the data.
Usage
Use this principle at the start of any SQL-based data ingestion workflow. It replaces the source type selection and connection steps of the classic wizard with a unified step that validates connectivity and returns both source and format configuration.
Theoretical Basis
SQL input source selection follows an EXTERN function mapping pattern:
InputSource + InputFormat → EXTERN(source_spec, format_spec, column_declarations)
Example:
INSERT INTO my_table
SELECT * FROM TABLE(
EXTERN('{"type":"s3","uris":["s3://bucket/data.json"]}',
'{"type":"json"}',
'[{"name":"ts","type":"VARCHAR"}, ...]')
)
PARTITIONED BY DAY
The component validates connectivity by sending a sample request to the Druid Sampler API, same as the classic wizard's source connection step.