Principle:Apache Druid SQL Input Source Selection

Knowledge Sources	Apache Druid Druid SQL Ingestion
Domains	Data_Ingestion, SQL_Ingestion
Last Updated	2026-02-10 00:00 GMT

Overview

A source selection principle for SQL-based ingestion that configures the external data source as an EXTERN function input to INSERT/REPLACE SQL statements.

Description

SQL Input Source Selection is the first step in the SQL-based data ingestion workflow (MSQ engine). Unlike the classic batch wizard which builds a JSON ingestion spec, the SQL workflow constructs an INSERT INTO or REPLACE INTO statement with an EXTERN() table function as the data source.

The InputSourceStep component presents the same source types as the classic wizard (S3, Azure, Google Cloud, HTTP, local, inline) but outputs an InputSource object that will be converted into an EXTERN() SQL function call. It also auto-detects the input format and optionally suggests a PARTITIONED BY hint based on the data.

Usage

Use this principle at the start of any SQL-based data ingestion workflow. It replaces the source type selection and connection steps of the classic wizard with a unified step that validates connectivity and returns both source and format configuration.

Theoretical Basis

SQL input source selection follows an EXTERN function mapping pattern:

InputSource + InputFormat → EXTERN(source_spec, format_spec, column_declarations)

Example:
  INSERT INTO my_table
  SELECT * FROM TABLE(
    EXTERN('{"type":"s3","uris":["s3://bucket/data.json"]}',
           '{"type":"json"}',
           '[{"name":"ts","type":"VARCHAR"}, ...]')
  )
  PARTITIONED BY DAY

The component validates connectivity by sending a sample request to the Druid Sampler API, same as the classic wizard's source connection step.

Related Pages

Implemented By

Implementation:Apache_Druid_InputSourceStep

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment