Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Druid SQL Schema Configuration

From Leeroopedia


Knowledge Sources
Domains Data_Ingestion, SQL_Ingestion, Schema_Design
Last Updated 2026-02-10 00:00 GMT

Overview

A SQL-based schema configuration principle that builds the column selection, type casting, filtering, partitioning, and clustering clauses of an INSERT/REPLACE SQL statement.

Description

SQL Schema Configuration is the core step of the SQL-based ingestion workflow. It takes the EXTERN table with auto-detected columns and builds a complete INSERT/REPLACE SQL statement through an interactive column editor.

Users can:

  • Add, remove, rename, and reorder columns in the SELECT clause
  • Cast column types (VARCHAR, BIGINT, DOUBLE, etc.)
  • Apply expressions to columns (UPPER, LOWER, TRIM, TIME_PARSE, etc.)
  • Configure PARTITIONED BY (time-based segmentation)
  • Configure CLUSTERED BY (secondary ordering within segments)
  • Enable rollup with GROUP BY and aggregation functions
  • Preview the resulting data via sample queries

The output is a complete SQL query string ready for submission.

Usage

Use this principle after input format configuration when the column declarations are established. This is the most interactive step — users shape the SQL query that will transform and load their data.

Theoretical Basis

SQL schema configuration follows a query builder pattern:

INSERT INTO target_table
SELECT
  TIME_PARSE("ts") AS __time,
  "user" AS user,
  CAST("count" AS BIGINT) AS event_count
FROM TABLE(EXTERN(...))
PARTITIONED BY DAY
CLUSTERED BY user

Components:
  SELECT clause → Column selection, renaming, type casting, expressions
  FROM clause   → EXTERN() function with source and format
  WHERE clause  → Optional row filtering
  GROUP BY      → Rollup mode (when enabled)
  PARTITIONED BY → Segment time granularity
  CLUSTERED BY   → Secondary ordering for query optimization

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment