Principle:Apache Druid Schema Definition
| Knowledge Sources | |
|---|---|
| Domains | Data_Ingestion, Schema_Design |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
A schema specification principle that defines the dimensional model (dimensions, metrics, and granularity) for data storage in Druid segments.
Description
Schema Definition is the step where users finalize how ingested data is stored in Druid. Druid uses a columnar storage model with two fundamental column types:
- Dimensions: Columns used for filtering and grouping (stored as individual columns for fast lookups). Can be strings, longs, floats, doubles, or complex types.
- Metrics: Pre-aggregated measures stored with their aggregation function (SUM, MIN, MAX, etc.). Metrics enable rollup — combining rows with identical dimension values into a single row.
The schema also defines queryGranularity (the finest time resolution for rollup) and rollup (whether to pre-aggregate during ingestion).
Two schema modes are supported:
- Explicit schema: User manually specifies all dimensions and metrics
- Schema discovery (auto): Druid auto-detects dimensions from the data with useSchemaDiscovery: true
Usage
Use this principle after filtering when the final set of columns is established. Schema definition directly impacts query performance, storage efficiency, and what types of queries can be answered. It is the last data-shaping step before partitioning and tuning configuration.
Theoretical Basis
Schema definition follows a dimensional modeling pattern:
DimensionsSpec = {
dimensions: DimensionSpec[], // Columns for filtering/grouping
useSchemaDiscovery?: boolean // Auto-detect mode
}
MetricsSpec = MetricSpec[] // Pre-aggregation definitions
MetricSpec = { type: 'longSum' | 'doubleSum' | 'count' | ..., name: string, fieldName: string }
GranularitySpec = {
queryGranularity: 'NONE' | 'SECOND' | 'MINUTE' | 'HOUR' | 'DAY',
rollup: boolean
}
When rollup is enabled, rows sharing identical dimension values and falling within the same queryGranularity bucket are combined, with metrics aggregated according to their type.