Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Druid SQL Input Source Selection

From Leeroopedia
Revision as of 17:57, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Apache_Druid_SQL_Input_Source_Selection.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Data_Ingestion, SQL_Ingestion
Last Updated 2026-02-10 00:00 GMT

Overview

A source selection principle for SQL-based ingestion that configures the external data source as an EXTERN function input to INSERT/REPLACE SQL statements.

Description

SQL Input Source Selection is the first step in the SQL-based data ingestion workflow (MSQ engine). Unlike the classic batch wizard which builds a JSON ingestion spec, the SQL workflow constructs an INSERT INTO or REPLACE INTO statement with an EXTERN() table function as the data source.

The InputSourceStep component presents the same source types as the classic wizard (S3, Azure, Google Cloud, HTTP, local, inline) but outputs an InputSource object that will be converted into an EXTERN() SQL function call. It also auto-detects the input format and optionally suggests a PARTITIONED BY hint based on the data.

Usage

Use this principle at the start of any SQL-based data ingestion workflow. It replaces the source type selection and connection steps of the classic wizard with a unified step that validates connectivity and returns both source and format configuration.

Theoretical Basis

SQL input source selection follows an EXTERN function mapping pattern:

InputSource + InputFormat → EXTERN(source_spec, format_spec, column_declarations)

Example:
  INSERT INTO my_table
  SELECT * FROM TABLE(
    EXTERN('{"type":"s3","uris":["s3://bucket/data.json"]}',
           '{"type":"json"}',
           '[{"name":"ts","type":"VARCHAR"}, ...]')
  )
  PARTITIONED BY DAY

The component validates connectivity by sending a sample request to the Druid Sampler API, same as the classic wizard's source connection step.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment