Workflow:Apache Druid SQL Query Execution

Knowledge Sources	Apache Druid Druid SQL Reference Query Context MSQ Concepts
Domains	SQL, Real_Time_Analytics, Data_Analysis
Last Updated	2026-02-10 10:00 GMT

Overview

End-to-end process for writing, executing, and analyzing SQL queries against Apache Druid datasources using the web console's Workbench view, supporting both real-time (native) and multi-stage query (MSQ) execution engines.

Description

This workflow covers the SQL query execution path in the Druid web console Workbench. The Workbench provides a tabbed SQL editor with syntax highlighting, autocompletion, schema browsing, query context configuration, and multi-engine execution support. Users write Druid SQL queries, choose an execution engine (native for real-time, MSQ for large-scale), execute the query, and analyze results in tabular or JSON format. The Workbench also supports EXPLAIN plans, query history, result export, and external data connectivity.

Key capabilities:

Tabbed query editor with Druid SQL syntax highlighting and autocompletion
Schema browser (Column Tree) showing all datasources, columns, and types
Dual execution engine: native (real-time, sub-second) and MSQ (distributed, large-scale)
Query context parameter configuration (timezone, execution mode, task limits)
EXPLAIN plan visualization for query optimization
Query result export (CSV, TSV, JSON, SQL VALUES)
Query history with replay capability
External data source connectivity via EXTERN()
MSQ execution stage visualization with progress tracking

Usage

Execute this workflow when you need to query Druid datasources using SQL, whether for ad-hoc analysis, dashboard prototyping, data validation, or complex aggregations. The native engine is ideal for interactive, sub-second queries; the MSQ engine handles large result sets and complex joins that exceed single-process memory limits.

Execution Steps

Step 1: Schema Discovery

Browse available datasources and their schemas using the Column Tree panel. The tree displays all tables from INFORMATION_SCHEMA with their column names, data types, and right-click context menus for generating query snippets. Users can also connect external data sources for ad-hoc querying.

Key considerations:

The Column Tree queries INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.COLUMNS
Right-click menus generate common SQL snippets (SELECT, GROUP BY, filter by value)
External data connections use EXTERN() function with inline schema definition
Metadata refreshes automatically when datasource changes are detected

Step 2: Query Composition

Write SQL queries in the tabbed editor with syntax highlighting and autocompletion. The editor provides Druid SQL completions for functions, table names, column names, and keywords. Multiple query tabs allow parallel work on different queries.

Key considerations:

Druid SQL extends standard SQL with functions like TIME_FLOOR, APPROX_COUNT_DISTINCT
Autocompletion suggests tables, columns, SQL functions, and query context parameters
Tab persistence saves queries to localStorage across browser sessions
Demo queries are available for new users to explore common patterns

Step 3: Query Context Configuration

Configure the query execution context including the execution engine, timezone, result format, and MSQ-specific parameters. The Run Panel provides engine selection (native vs. MSQ), max task count, and advanced context editing.

Key considerations:

Native engine: synchronous, single-process, optimized for low-latency interactive queries
MSQ engine: asynchronous, multi-stage distributed, required for INSERT/REPLACE and large results
maxNumTasks controls MSQ parallelism (higher values use more cluster resources)
sqlTimeZone affects time function behavior (default UTC)
Custom context parameters can be edited as JSON for advanced tuning

Step 4: Query Execution

Execute the query using the selected engine. The Workbench submits the query to the appropriate Druid API endpoint and tracks execution status. For native queries, results return immediately. For MSQ queries, the Workbench polls for completion and displays stage-by-stage progress.

What happens:

Native queries: POST to the synchronous SQL endpoint, results stream back
MSQ queries: POST to the async SQL statements endpoint, returns a query ID
The execution timer shows elapsed time during query execution
Running queries can be canceled via the cancel button
Errors display with position information highlighting the problematic SQL

Step 5: Result Analysis

Analyze query results in the interactive result table. The table supports column filtering, sorting, pagination, and cell-level actions (copy value, filter by value). For MSQ queries, execution stage details show per-stage metrics including rows processed, CPU time, and shuffle statistics.

Key considerations:

Result table supports column visibility toggling and width adjustment
Cell context menus allow filtering, copying, and generating follow-up queries
MSQ execution stages display worker counts, input/output rows, and duration per stage
Large result sets can be downloaded via the destination pages panel
Results can be exported as CSV, TSV, JSON, or SQL VALUES clauses

Step 6: Query History and Iteration

Review and replay previous queries from the query history. The Workbench automatically saves executed queries with their results and timestamps. Users iterate on queries by modifying and re-executing, building up analysis incrementally.

Key considerations:

Query history persists in localStorage with configurable retention
Each history entry records the query text, execution time, and result count
Queries can be restored into new or existing tabs for modification
The EXPLAIN dialog helps optimize slow queries by revealing the execution plan

Execution Diagram

GitHub URL

Workflow Repository