Principle:Apache Druid SQL Schema Configuration

Knowledge Sources	Apache Druid Druid SQL Ingestion
Domains	Data_Ingestion, SQL_Ingestion, Schema_Design
Last Updated	2026-02-10 00:00 GMT

Overview

A SQL-based schema configuration principle that builds the column selection, type casting, filtering, partitioning, and clustering clauses of an INSERT/REPLACE SQL statement.

Description

SQL Schema Configuration is the core step of the SQL-based ingestion workflow. It takes the EXTERN table with auto-detected columns and builds a complete INSERT/REPLACE SQL statement through an interactive column editor.

Users can:

Add, remove, rename, and reorder columns in the SELECT clause
Cast column types (VARCHAR, BIGINT, DOUBLE, etc.)
Apply expressions to columns (UPPER, LOWER, TRIM, TIME_PARSE, etc.)
Configure PARTITIONED BY (time-based segmentation)
Configure CLUSTERED BY (secondary ordering within segments)
Enable rollup with GROUP BY and aggregation functions
Preview the resulting data via sample queries

The output is a complete SQL query string ready for submission.

Usage

Use this principle after input format configuration when the column declarations are established. This is the most interactive step — users shape the SQL query that will transform and load their data.

Theoretical Basis

SQL schema configuration follows a query builder pattern:

INSERT INTO target_table
SELECT
  TIME_PARSE("ts") AS __time,
  "user" AS user,
  CAST("count" AS BIGINT) AS event_count
FROM TABLE(EXTERN(...))
PARTITIONED BY DAY
CLUSTERED BY user

Components:
  SELECT clause → Column selection, renaming, type casting, expressions
  FROM clause   → EXTERN() function with source and format
  WHERE clause  → Optional row filtering
  GROUP BY      → Rollup mode (when enabled)
  PARTITIONED BY → Segment time granularity
  CLUSTERED BY   → Secondary ordering for query optimization

Related Pages

Implemented By

Implementation:Apache_Druid_SchemaStep

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment