Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Apache Druid Schema Definition

From Leeroopedia


Knowledge Sources
Domains Data_Ingestion, Schema_Design
Last Updated 2026-02-10 00:00 GMT

Overview

A schema specification principle that defines the dimensional model (dimensions, metrics, and granularity) for data storage in Druid segments.

Description

Schema Definition is the step where users finalize how ingested data is stored in Druid. Druid uses a columnar storage model with two fundamental column types:

  • Dimensions: Columns used for filtering and grouping (stored as individual columns for fast lookups). Can be strings, longs, floats, doubles, or complex types.
  • Metrics: Pre-aggregated measures stored with their aggregation function (SUM, MIN, MAX, etc.). Metrics enable rollup — combining rows with identical dimension values into a single row.

The schema also defines queryGranularity (the finest time resolution for rollup) and rollup (whether to pre-aggregate during ingestion).

Two schema modes are supported:

  • Explicit schema: User manually specifies all dimensions and metrics
  • Schema discovery (auto): Druid auto-detects dimensions from the data with useSchemaDiscovery: true

Usage

Use this principle after filtering when the final set of columns is established. Schema definition directly impacts query performance, storage efficiency, and what types of queries can be answered. It is the last data-shaping step before partitioning and tuning configuration.

Theoretical Basis

Schema definition follows a dimensional modeling pattern:

DimensionsSpec = {
  dimensions: DimensionSpec[],    // Columns for filtering/grouping
  useSchemaDiscovery?: boolean     // Auto-detect mode
}

MetricsSpec = MetricSpec[]         // Pre-aggregation definitions
  MetricSpec = { type: 'longSum' | 'doubleSum' | 'count' | ..., name: string, fieldName: string }

GranularitySpec = {
  queryGranularity: 'NONE' | 'SECOND' | 'MINUTE' | 'HOUR' | 'DAY',
  rollup: boolean
}

When rollup is enabled, rows sharing identical dimension values and falling within the same queryGranularity bucket are combined, with metrics aggregated according to their type.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment