Heuristic:Apache Druid Capability Detection Strategy
| Knowledge Sources | |
|---|---|
| Domains | Cluster Management, Web Console, Feature Detection, Service Discovery |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
The Druid web console uses a multi-step probing strategy to detect which cluster capabilities are available (SQL, native queries, management proxy, MSQ, DART) and adapts its UI feature set accordingly, using a 15-second timeout on all probe requests.
Description
When the web console loads, it must determine what the connected Druid cluster supports. Different deployments may have SQL disabled, may lack the management proxy (meaning Coordinator/Overlord are unreachable from the Router), or may not have Multi-Stage Query (MSQ) extensions installed. The Capabilities class in capabilities.ts orchestrates this detection through a series of HTTP probes.
The detection proceeds in a defined order:
- SQL detection (
detectQueryType): POSTSELECT 1337to/druid/v2/sql. If it succeeds, SQL is available. If it returns 405 or 404, SQL is disabled. Then check if native queries work by probing/druid/v2with adataSourceMetadataquery. - Node detection: If the query type is
none(not on a Router), the console separately probes/druid/coordinator/v1/isLeaderand/druid/indexer/v1/isLeaderto determine which node it is running on. - Management proxy detection (
detectManagementProxy): GET/proxy/enabled. A successful response or a 400 status means the proxy is active (the 400 trick: the route exists but the proxy does not recognize the specific path, proving the proxy is enabled). - MSQ Task detection: GET
/druid/v2/sql/task/enabledto check for Multi-Stage Query task engine. - MSQ DART detection: GET
/druid/v2/sql/enginesand look for themsq-dartengine name.
All probes use STATUS_TIMEOUT = 15000 ms. The resulting capability object determines which query engines are offered to the user: native, sql-native, sql-msq-task, and/or sql-msq-dart.
The console defines several pre-built capability modes:
| Mode | queryType | coordinator | overlord | Description |
|---|---|---|---|---|
full |
nativeAndSql | true | true | Full-featured Router deployment |
no-sql |
nativeOnly | true | true | SQL is disabled |
no-proxy |
nativeAndSql | false | false | Management proxy not configured |
coordinator |
none | true | false | Console on Coordinator only |
overlord |
none | false | true | Console on Overlord only |
Usage
Apply these heuristics when:
- Troubleshooting why certain UI features are missing or disabled in the web console
- Deploying Druid in non-standard topologies (e.g., separate Coordinator and Overlord)
- Extending the console with features that depend on specific cluster capabilities
- Diagnosing slow console startup (15-second timeout per failed probe)
The Insight (Rule of Thumb)
- Action: Probe the cluster at startup using lightweight HTTP requests with a 15-second timeout, interpret both success and specific error codes (400, 404, 405) as signals, and build a capability profile that gates all subsequent UI feature availability.
- Value: The console gracefully degrades to match the actual cluster configuration instead of showing broken features. A single-node deployment, a SQL-disabled cluster, or a proxy-less Router all get a coherent UI tailored to what actually works.
- Trade-off: The probing adds startup latency (up to 15 seconds per failed probe). If a service is temporarily down during startup, the console may underestimate capabilities and require a page refresh to recover. The 400-status proxy trick is fragile and depends on the specific error handling behavior of the management proxy servlet.
Reasoning
Druid's architecture is highly modular: SQL can be disabled, the management proxy can be absent, MSQ extensions may not be installed, and the console can be served from any node type. Rather than requiring configuration files to declare capabilities, the console probes the live cluster. This follows the principle of zero-configuration discovery.
The 400-status trick for management proxy detection is particularly clever: when /proxy/enabled returns 400, it means the proxy servlet received the request and rejected the specific path -- proving the proxy is operational. A 404 would mean the proxy servlet itself is not registered. This distinction is critical because the route /proxy/enabled was added after the proxy feature, so older versions return 400 instead of 200.
The MSQ Task and MSQ DART probes run in parallel (Promise.all) since they are independent, reducing startup time.
Code Evidence
Status timeout constant (capabilities.ts:80):
static STATUS_TIMEOUT = 15000;
SQL detection with SELECT 1337 probe (capabilities.ts:88-130):
static async detectQueryType(): Promise<QueryType | undefined> {
// Check SQL endpoint
try {
await Api.instance.post(
'/druid/v2/sql?capabilities',
{ query: 'SELECT 1337', context: { timeout: Capabilities.STATUS_TIMEOUT } },
{ timeout: Capabilities.STATUS_TIMEOUT },
);
} catch (e) {
const status = e.response?.status;
if (status !== 405 && status !== 404) {
return; // other failure
}
try {
await Api.instance.get('/status?capabilities', { timeout: Capabilities.STATUS_TIMEOUT });
} catch (e) {
return; // total failure
}
// Status works but SQL 405s => the SQL endpoint is disabled
try {
await Api.instance.post(
'/druid/v2?capabilities',
{
queryType: 'dataSourceMetadata',
dataSource: '__web_console_probe__',
context: { timeout: Capabilities.STATUS_TIMEOUT },
},
{ timeout: Capabilities.STATUS_TIMEOUT },
);
} catch (e) {
if (status !== 405 && status !== 404) {
return; // other failure
}
return 'none';
}
return 'nativeOnly';
}
return 'nativeAndSql';
}
Management proxy detection with 400-status trick (capabilities.ts:132-144):
static async detectManagementProxy(): Promise<boolean> {
try {
await Api.instance.get(`/proxy/enabled?capabilities`, {
timeout: Capabilities.STATUS_TIMEOUT,
});
} catch (e) {
const status = e.response?.status;
// If we detect error code 400 the management proxy is enabled but just does not know about
// the recently added /proxy/enabled route so treat this as a win.
return status === 400;
}
return true;
}
Parallel MSQ detection (capabilities.ts:197-201):
const [multiStageQueryTask, multiStageQueryDart] = await Promise.all([
Capabilities.detectMultiStageQueryTask(),
Capabilities.detectMultiStageQueryDart(),
]);
Query engine enumeration (capabilities.ts:337-349):
public getSupportedQueryEngines(): DruidEngine[] {
const queryEngines: DruidEngine[] = ['native'];
if (this.hasSql()) {
queryEngines.push('sql-native');
}
if (this.hasMultiStageQueryTask()) {
queryEngines.push('sql-msq-task');
}
if (this.hasMultiStageQueryDart()) {
queryEngines.push('sql-msq-dart');
}
return queryEngines;
}