Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Apache Druid Capability Detection Strategy

From Leeroopedia



Knowledge Sources
Domains Cluster Management, Web Console, Feature Detection, Service Discovery
Last Updated 2026-02-10 10:00 GMT

Overview

The Druid web console uses a multi-step probing strategy to detect which cluster capabilities are available (SQL, native queries, management proxy, MSQ, DART) and adapts its UI feature set accordingly, using a 15-second timeout on all probe requests.

Description

When the web console loads, it must determine what the connected Druid cluster supports. Different deployments may have SQL disabled, may lack the management proxy (meaning Coordinator/Overlord are unreachable from the Router), or may not have Multi-Stage Query (MSQ) extensions installed. The Capabilities class in capabilities.ts orchestrates this detection through a series of HTTP probes.

The detection proceeds in a defined order:

  1. SQL detection (detectQueryType): POST SELECT 1337 to /druid/v2/sql. If it succeeds, SQL is available. If it returns 405 or 404, SQL is disabled. Then check if native queries work by probing /druid/v2 with a dataSourceMetadata query.
  2. Node detection: If the query type is none (not on a Router), the console separately probes /druid/coordinator/v1/isLeader and /druid/indexer/v1/isLeader to determine which node it is running on.
  3. Management proxy detection (detectManagementProxy): GET /proxy/enabled. A successful response or a 400 status means the proxy is active (the 400 trick: the route exists but the proxy does not recognize the specific path, proving the proxy is enabled).
  4. MSQ Task detection: GET /druid/v2/sql/task/enabled to check for Multi-Stage Query task engine.
  5. MSQ DART detection: GET /druid/v2/sql/engines and look for the msq-dart engine name.

All probes use STATUS_TIMEOUT = 15000 ms. The resulting capability object determines which query engines are offered to the user: native, sql-native, sql-msq-task, and/or sql-msq-dart.

The console defines several pre-built capability modes:

Mode queryType coordinator overlord Description
full nativeAndSql true true Full-featured Router deployment
no-sql nativeOnly true true SQL is disabled
no-proxy nativeAndSql false false Management proxy not configured
coordinator none true false Console on Coordinator only
overlord none false true Console on Overlord only

Usage

Apply these heuristics when:

  • Troubleshooting why certain UI features are missing or disabled in the web console
  • Deploying Druid in non-standard topologies (e.g., separate Coordinator and Overlord)
  • Extending the console with features that depend on specific cluster capabilities
  • Diagnosing slow console startup (15-second timeout per failed probe)

The Insight (Rule of Thumb)

  • Action: Probe the cluster at startup using lightweight HTTP requests with a 15-second timeout, interpret both success and specific error codes (400, 404, 405) as signals, and build a capability profile that gates all subsequent UI feature availability.
  • Value: The console gracefully degrades to match the actual cluster configuration instead of showing broken features. A single-node deployment, a SQL-disabled cluster, or a proxy-less Router all get a coherent UI tailored to what actually works.
  • Trade-off: The probing adds startup latency (up to 15 seconds per failed probe). If a service is temporarily down during startup, the console may underestimate capabilities and require a page refresh to recover. The 400-status proxy trick is fragile and depends on the specific error handling behavior of the management proxy servlet.

Reasoning

Druid's architecture is highly modular: SQL can be disabled, the management proxy can be absent, MSQ extensions may not be installed, and the console can be served from any node type. Rather than requiring configuration files to declare capabilities, the console probes the live cluster. This follows the principle of zero-configuration discovery.

The 400-status trick for management proxy detection is particularly clever: when /proxy/enabled returns 400, it means the proxy servlet received the request and rejected the specific path -- proving the proxy is operational. A 404 would mean the proxy servlet itself is not registered. This distinction is critical because the route /proxy/enabled was added after the proxy feature, so older versions return 400 instead of 200.

The MSQ Task and MSQ DART probes run in parallel (Promise.all) since they are independent, reducing startup time.

Code Evidence

Status timeout constant (capabilities.ts:80):

  static STATUS_TIMEOUT = 15000;

SQL detection with SELECT 1337 probe (capabilities.ts:88-130):

  static async detectQueryType(): Promise<QueryType | undefined> {
    // Check SQL endpoint
    try {
      await Api.instance.post(
        '/druid/v2/sql?capabilities',
        { query: 'SELECT 1337', context: { timeout: Capabilities.STATUS_TIMEOUT } },
        { timeout: Capabilities.STATUS_TIMEOUT },
      );
    } catch (e) {
      const status = e.response?.status;
      if (status !== 405 && status !== 404) {
        return; // other failure
      }
      try {
        await Api.instance.get('/status?capabilities', { timeout: Capabilities.STATUS_TIMEOUT });
      } catch (e) {
        return; // total failure
      }
      // Status works but SQL 405s => the SQL endpoint is disabled

      try {
        await Api.instance.post(
          '/druid/v2?capabilities',
          {
            queryType: 'dataSourceMetadata',
            dataSource: '__web_console_probe__',
            context: { timeout: Capabilities.STATUS_TIMEOUT },
          },
          { timeout: Capabilities.STATUS_TIMEOUT },
        );
      } catch (e) {
        if (status !== 405 && status !== 404) {
          return; // other failure
        }
        return 'none';
      }

      return 'nativeOnly';
    }

    return 'nativeAndSql';
  }

Management proxy detection with 400-status trick (capabilities.ts:132-144):

  static async detectManagementProxy(): Promise<boolean> {
    try {
      await Api.instance.get(`/proxy/enabled?capabilities`, {
        timeout: Capabilities.STATUS_TIMEOUT,
      });
    } catch (e) {
      const status = e.response?.status;
      // If we detect error code 400 the management proxy is enabled but just does not know about
      // the recently added /proxy/enabled route so treat this as a win.
      return status === 400;
    }

    return true;
  }

Parallel MSQ detection (capabilities.ts:197-201):

    const [multiStageQueryTask, multiStageQueryDart] = await Promise.all([
      Capabilities.detectMultiStageQueryTask(),
      Capabilities.detectMultiStageQueryDart(),
    ]);

Query engine enumeration (capabilities.ts:337-349):

  public getSupportedQueryEngines(): DruidEngine[] {
    const queryEngines: DruidEngine[] = ['native'];
    if (this.hasSql()) {
      queryEngines.push('sql-native');
    }
    if (this.hasMultiStageQueryTask()) {
      queryEngines.push('sql-msq-task');
    }
    if (this.hasMultiStageQueryDart()) {
      queryEngines.push('sql-msq-dart');
    }
    return queryEngines;
  }

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment