Principle:Apache Druid Data Transformation
| Knowledge Sources | |
|---|---|
| Domains | Data_Ingestion, Data_Transformation |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
A data enrichment principle that applies expression-based transformations to create derived columns during ingestion.
Description
Data Transformation enables users to create new computed columns from existing data during the ingestion pipeline. Druid's transformSpec.transforms array defines a list of named expressions that generate new columns using the Druid expression language.
Transforms can perform operations such as:
- String manipulation (concat, substring, regex extraction)
- Mathematical calculations (arithmetic, rounding, clamping)
- Conditional logic (case/when expressions)
- Lookup table joins (mapping values via Druid lookups)
- Type casting (string to number, timestamp formatting)
The transform step occurs after timestamp extraction and before filtering, allowing transformed columns to be used in filter conditions.
Usage
Use this principle when ingested data needs enrichment or derived columns that don't exist in the raw source. Transforms are optional — skip this step if the raw columns are sufficient for your use case.
Theoretical Basis
Data transformation follows an expression evaluation pipeline:
Transform = { name: string, expression: string }
TransformSpec = { transforms: Transform[] }
For each row:
For each transform:
row[transform.name] = evaluate(transform.expression, row)
Transforms are applied server-side by the Druid sampler, and the preview shows both original and derived columns. Auto-dimension detection includes transform output columns in the schema.