Workflow:Apache Druid SQL Query Execution
| Knowledge Sources | |
|---|---|
| Domains | SQL, Real_Time_Analytics, Data_Analysis |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
End-to-end process for writing, executing, and analyzing SQL queries against Apache Druid datasources using the web console's Workbench view, supporting both real-time (native) and multi-stage query (MSQ) execution engines.
Description
This workflow covers the SQL query execution path in the Druid web console Workbench. The Workbench provides a tabbed SQL editor with syntax highlighting, autocompletion, schema browsing, query context configuration, and multi-engine execution support. Users write Druid SQL queries, choose an execution engine (native for real-time, MSQ for large-scale), execute the query, and analyze results in tabular or JSON format. The Workbench also supports EXPLAIN plans, query history, result export, and external data connectivity.
Key capabilities:
- Tabbed query editor with Druid SQL syntax highlighting and autocompletion
- Schema browser (Column Tree) showing all datasources, columns, and types
- Dual execution engine: native (real-time, sub-second) and MSQ (distributed, large-scale)
- Query context parameter configuration (timezone, execution mode, task limits)
- EXPLAIN plan visualization for query optimization
- Query result export (CSV, TSV, JSON, SQL VALUES)
- Query history with replay capability
- External data source connectivity via EXTERN()
- MSQ execution stage visualization with progress tracking
Usage
Execute this workflow when you need to query Druid datasources using SQL, whether for ad-hoc analysis, dashboard prototyping, data validation, or complex aggregations. The native engine is ideal for interactive, sub-second queries; the MSQ engine handles large result sets and complex joins that exceed single-process memory limits.
Execution Steps
Step 1: Schema Discovery
Browse available datasources and their schemas using the Column Tree panel. The tree displays all tables from INFORMATION_SCHEMA with their column names, data types, and right-click context menus for generating query snippets. Users can also connect external data sources for ad-hoc querying.
Key considerations:
- The Column Tree queries INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.COLUMNS
- Right-click menus generate common SQL snippets (SELECT, GROUP BY, filter by value)
- External data connections use EXTERN() function with inline schema definition
- Metadata refreshes automatically when datasource changes are detected
Step 2: Query Composition
Write SQL queries in the tabbed editor with syntax highlighting and autocompletion. The editor provides Druid SQL completions for functions, table names, column names, and keywords. Multiple query tabs allow parallel work on different queries.
Key considerations:
- Druid SQL extends standard SQL with functions like TIME_FLOOR, APPROX_COUNT_DISTINCT
- Autocompletion suggests tables, columns, SQL functions, and query context parameters
- Tab persistence saves queries to localStorage across browser sessions
- Demo queries are available for new users to explore common patterns
Step 3: Query Context Configuration
Configure the query execution context including the execution engine, timezone, result format, and MSQ-specific parameters. The Run Panel provides engine selection (native vs. MSQ), max task count, and advanced context editing.
Key considerations:
- Native engine: synchronous, single-process, optimized for low-latency interactive queries
- MSQ engine: asynchronous, multi-stage distributed, required for INSERT/REPLACE and large results
- maxNumTasks controls MSQ parallelism (higher values use more cluster resources)
- sqlTimeZone affects time function behavior (default UTC)
- Custom context parameters can be edited as JSON for advanced tuning
Step 4: Query Execution
Execute the query using the selected engine. The Workbench submits the query to the appropriate Druid API endpoint and tracks execution status. For native queries, results return immediately. For MSQ queries, the Workbench polls for completion and displays stage-by-stage progress.
What happens:
- Native queries: POST to the synchronous SQL endpoint, results stream back
- MSQ queries: POST to the async SQL statements endpoint, returns a query ID
- The execution timer shows elapsed time during query execution
- Running queries can be canceled via the cancel button
- Errors display with position information highlighting the problematic SQL
Step 5: Result Analysis
Analyze query results in the interactive result table. The table supports column filtering, sorting, pagination, and cell-level actions (copy value, filter by value). For MSQ queries, execution stage details show per-stage metrics including rows processed, CPU time, and shuffle statistics.
Key considerations:
- Result table supports column visibility toggling and width adjustment
- Cell context menus allow filtering, copying, and generating follow-up queries
- MSQ execution stages display worker counts, input/output rows, and duration per stage
- Large result sets can be downloaded via the destination pages panel
- Results can be exported as CSV, TSV, JSON, or SQL VALUES clauses
Step 6: Query History and Iteration
Review and replay previous queries from the query history. The Workbench automatically saves executed queries with their results and timestamps. Users iterate on queries by modifying and re-executing, building up analysis incrementally.
Key considerations:
- Query history persists in localStorage with configurable retention
- Each history entry records the query text, execution time, and result count
- Queries can be restored into new or existing tabs for modification
- The EXPLAIN dialog helps optimize slow queries by revealing the execution plan