Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Druid Data Transformation

From Leeroopedia
Revision as of 18:24, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Apache_Druid_Data_Transformation.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Data_Ingestion, Data_Transformation
Last Updated 2026-02-10 00:00 GMT

Overview

A data enrichment principle that applies expression-based transformations to create derived columns during ingestion.

Description

Data Transformation enables users to create new computed columns from existing data during the ingestion pipeline. Druid's transformSpec.transforms array defines a list of named expressions that generate new columns using the Druid expression language.

Transforms can perform operations such as:

  • String manipulation (concat, substring, regex extraction)
  • Mathematical calculations (arithmetic, rounding, clamping)
  • Conditional logic (case/when expressions)
  • Lookup table joins (mapping values via Druid lookups)
  • Type casting (string to number, timestamp formatting)

The transform step occurs after timestamp extraction and before filtering, allowing transformed columns to be used in filter conditions.

Usage

Use this principle when ingested data needs enrichment or derived columns that don't exist in the raw source. Transforms are optional — skip this step if the raw columns are sufficient for your use case.

Theoretical Basis

Data transformation follows an expression evaluation pipeline:

Transform = { name: string, expression: string }
TransformSpec = { transforms: Transform[] }

For each row:
  For each transform:
    row[transform.name] = evaluate(transform.expression, row)

Transforms are applied server-side by the Druid sampler, and the preview shows both original and derived columns. Auto-dimension detection includes transform output columns in the schema.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment