Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Apache Druid SampleForTransform

From Leeroopedia


Knowledge Sources
Domains Data_Ingestion, Data_Transformation
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete sampler API client function for previewing expression-based data transformations on sample data.

Description

The sampleForTransform function applies user-defined transforms to cached sample data. It performs a two-phase operation: first, it queries without transforms to auto-detect base dimensions via guessDimensionsFromSampleResponse(), then combines those dimensions with transform-generated column names to build a complete dimensionsSpec. The final sampler call includes the transforms and returns data with both original and derived columns.

Usage

Call this function after timestamp configuration when the user has defined one or more transform expressions. The function handles the complexity of merging auto-detected dimensions with transform-generated columns.

Code Reference

Source Location

  • Repository: Apache Druid
  • File: web-console/src/utils/sampler.ts
  • Lines: L468-L538

Signature

export async function sampleForTransform(
  spec: Partial<IngestionSpec>,
  cacheRows: CacheRows,
  forceSegmentSortByTime: boolean,
): Promise<SampleResponse>

Import

import { sampleForTransform } from '../utils/sampler';

I/O Contract

Inputs

Name Type Required Description
spec Partial<IngestionSpec> Yes Ingestion spec with transformSpec.transforms array configured
cacheRows CacheRows Yes Cached parsed data from previous sampler calls
forceSegmentSortByTime boolean Yes Whether to force time-based segment sorting in dimensionsSpec

Outputs

Name Type Description
data SampleEntry[] Array of rows with original columns plus transform-generated columns
cacheKey string or undefined Cache key for subsequent calls

Usage Examples

Adding a Derived Column

import { sampleForTransform } from '../utils/sampler';

const spec = {
  type: 'index_parallel',
  spec: {
    dataSchema: {
      timestampSpec: { column: 'ts', format: 'iso' },
      transformSpec: {
        transforms: [
          { type: 'expression', name: 'full_name', expression: "concat(first_name, ' ', last_name)" },
          { type: 'expression', name: 'age_bucket', expression: "div(age, 10) * 10" },
        ],
      },
    },
    ioConfig: { /* ... */ },
  },
};

const result = await sampleForTransform(spec, cachedRows, true);
// result.data[0].parsed.full_name = 'Alice Smith'
// result.data[0].parsed.age_bucket = 30

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment