Implementation:Apache Druid SampleForTransform
| Knowledge Sources | |
|---|---|
| Domains | Data_Ingestion, Data_Transformation |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete sampler API client function for previewing expression-based data transformations on sample data.
Description
The sampleForTransform function applies user-defined transforms to cached sample data. It performs a two-phase operation: first, it queries without transforms to auto-detect base dimensions via guessDimensionsFromSampleResponse(), then combines those dimensions with transform-generated column names to build a complete dimensionsSpec. The final sampler call includes the transforms and returns data with both original and derived columns.
Usage
Call this function after timestamp configuration when the user has defined one or more transform expressions. The function handles the complexity of merging auto-detected dimensions with transform-generated columns.
Code Reference
Source Location
- Repository: Apache Druid
- File: web-console/src/utils/sampler.ts
- Lines: L468-L538
Signature
export async function sampleForTransform(
spec: Partial<IngestionSpec>,
cacheRows: CacheRows,
forceSegmentSortByTime: boolean,
): Promise<SampleResponse>
Import
import { sampleForTransform } from '../utils/sampler';
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| spec | Partial<IngestionSpec> | Yes | Ingestion spec with transformSpec.transforms array configured |
| cacheRows | CacheRows | Yes | Cached parsed data from previous sampler calls |
| forceSegmentSortByTime | boolean | Yes | Whether to force time-based segment sorting in dimensionsSpec |
Outputs
| Name | Type | Description |
|---|---|---|
| data | SampleEntry[] | Array of rows with original columns plus transform-generated columns |
| cacheKey | string or undefined | Cache key for subsequent calls |
Usage Examples
Adding a Derived Column
import { sampleForTransform } from '../utils/sampler';
const spec = {
type: 'index_parallel',
spec: {
dataSchema: {
timestampSpec: { column: 'ts', format: 'iso' },
transformSpec: {
transforms: [
{ type: 'expression', name: 'full_name', expression: "concat(first_name, ' ', last_name)" },
{ type: 'expression', name: 'age_bucket', expression: "div(age, 10) * 10" },
],
},
},
ioConfig: { /* ... */ },
},
};
const result = await sampleForTransform(spec, cachedRows, true);
// result.data[0].parsed.full_name = 'Alice Smith'
// result.data[0].parsed.age_bucket = 30