Implementation:Apache Druid SchemaStep
| Knowledge Sources | |
|---|---|
| Domains | Data_Ingestion, SQL_Ingestion, Schema_Design |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete React component for interactive SQL query building with column editing, partitioning, and clustering configuration.
Description
The SchemaStep component (1083 lines) is the main interactive step of the SQL data loader. It parses the current SQL query string, displays a column editor grid, and provides controls for PARTITIONED BY, CLUSTERED BY, rollup, and query preview. Changes to columns (add, remove, rename, cast, apply expression) are reflected back into the SQL query string via ingestQueryPatternToQuery().
The component uses postToSampler() for lightweight previews and submitTaskQuery() for full query execution previews with larger datasets.
Usage
Render this component as the main step in the SQL data loader wizard. It receives the SQL query string and updates it through the onQueryStringChange callback.
Code Reference
Source Location
- Repository: Apache Druid
- File: web-console/src/views/sql-data-loader-view/schema-step/schema-step.tsx
- Lines: L265-L1083
Signature
interface SchemaStepProps {
queryString: string;
onQueryStringChange(queryString: string): void;
enableAnalyze: boolean;
onDone(): void;
}
export const SchemaStep = React.memo(function SchemaStep(
props: SchemaStepProps,
): JSX.Element {
// 1083-line component with column editor, preview, and SQL generation
});
Import
import { SchemaStep } from './schema-step/schema-step';
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| queryString | string | Yes | Current INSERT/REPLACE SQL query string |
| onQueryStringChange | callback | Yes | Called when the SQL query is modified by user interactions |
| enableAnalyze | boolean | Yes | Whether the rollup analysis feature is available |
| onDone | callback | Yes | Called when the user finishes schema configuration |
Outputs
| Name | Type | Description |
|---|---|---|
| queryString | string | Modified SQL query string with updated columns, types, partitioning, and clustering |
Usage Examples
Generated SQL Output
-- Example SQL generated by SchemaStep:
REPLACE INTO "my_events" OVERWRITE ALL
SELECT
TIME_PARSE("timestamp") AS "__time",
"user_id",
"event_type",
CAST("value" AS DOUBLE) AS "value"
FROM TABLE(
EXTERN(
'{"type":"s3","uris":["s3://bucket/events.json"]}',
'{"type":"json"}',
'[{"name":"timestamp","type":"VARCHAR"},{"name":"user_id","type":"VARCHAR"},{"name":"event_type","type":"VARCHAR"},{"name":"value","type":"VARCHAR"}]'
)
)
PARTITIONED BY DAY
CLUSTERED BY "user_id"