Principle:Apache Druid SQL Schema Configuration
| Knowledge Sources | |
|---|---|
| Domains | Data_Ingestion, SQL_Ingestion, Schema_Design |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
A SQL-based schema configuration principle that builds the column selection, type casting, filtering, partitioning, and clustering clauses of an INSERT/REPLACE SQL statement.
Description
SQL Schema Configuration is the core step of the SQL-based ingestion workflow. It takes the EXTERN table with auto-detected columns and builds a complete INSERT/REPLACE SQL statement through an interactive column editor.
Users can:
- Add, remove, rename, and reorder columns in the SELECT clause
- Cast column types (VARCHAR, BIGINT, DOUBLE, etc.)
- Apply expressions to columns (UPPER, LOWER, TRIM, TIME_PARSE, etc.)
- Configure PARTITIONED BY (time-based segmentation)
- Configure CLUSTERED BY (secondary ordering within segments)
- Enable rollup with GROUP BY and aggregation functions
- Preview the resulting data via sample queries
The output is a complete SQL query string ready for submission.
Usage
Use this principle after input format configuration when the column declarations are established. This is the most interactive step — users shape the SQL query that will transform and load their data.
Theoretical Basis
SQL schema configuration follows a query builder pattern:
INSERT INTO target_table
SELECT
TIME_PARSE("ts") AS __time,
"user" AS user,
CAST("count" AS BIGINT) AS event_count
FROM TABLE(EXTERN(...))
PARTITIONED BY DAY
CLUSTERED BY user
Components:
SELECT clause → Column selection, renaming, type casting, expressions
FROM clause → EXTERN() function with source and format
WHERE clause → Optional row filtering
GROUP BY → Rollup mode (when enabled)
PARTITIONED BY → Segment time granularity
CLUSTERED BY → Secondary ordering for query optimization