Principle:Apache Druid Data Transformation

Knowledge Sources	Apache Druid Druid Transforms
Domains	Data_Ingestion, Data_Transformation
Last Updated	2026-02-10 00:00 GMT

Overview

A data enrichment principle that applies expression-based transformations to create derived columns during ingestion.

Description

Data Transformation enables users to create new computed columns from existing data during the ingestion pipeline. Druid's transformSpec.transforms array defines a list of named expressions that generate new columns using the Druid expression language.

Transforms can perform operations such as:

String manipulation (concat, substring, regex extraction)
Mathematical calculations (arithmetic, rounding, clamping)
Conditional logic (case/when expressions)
Lookup table joins (mapping values via Druid lookups)
Type casting (string to number, timestamp formatting)

The transform step occurs after timestamp extraction and before filtering, allowing transformed columns to be used in filter conditions.

Usage

Use this principle when ingested data needs enrichment or derived columns that don't exist in the raw source. Transforms are optional — skip this step if the raw columns are sufficient for your use case.

Theoretical Basis

Data transformation follows an expression evaluation pipeline:

Transform = { name: string, expression: string }
TransformSpec = { transforms: Transform[] }

For each row:
  For each transform:
    row[transform.name] = evaluate(transform.expression, row)

Transforms are applied server-side by the Druid sampler, and the preview shows both original and derived columns. Auto-dimension detection includes transform output columns in the schema.

Related Pages

Implemented By

Implementation:Apache_Druid_SampleForTransform

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment