Implementation:Apache Druid SampleForSchema
| Knowledge Sources | |
|---|---|
| Domains | Data_Ingestion, Schema_Design |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete sampler API client function for previewing the final schema definition applied to sample data.
Description
The sampleForSchema function sends a sampling request with the complete dataSchema (dimensionsSpec, metricsSpec, granularitySpec, transformSpec, timestampSpec) to the Druid Sampler API. This is the final sampling step that shows exactly how data will look when stored in Druid segments: which columns are dimensions, which are metrics, what rollup behavior applies, and at what time granularity.
Usage
Call this function after the user has configured dimensions, metrics, rollup settings, and query granularity. The result previews the final ingested data shape.
Code Reference
Source Location
- Repository: Apache Druid
- File: web-console/src/utils/sampler.ts
- Lines: L609-L643
Signature
export async function sampleForSchema(
spec: Partial<IngestionSpec>,
cacheRows: CacheRows,
): Promise<SampleResponse>
Import
import { sampleForSchema } from '../utils/sampler';
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| spec | Partial<IngestionSpec> | Yes | Ingestion spec with dimensionsSpec, metricsSpec, and granularitySpec configured |
| cacheRows | CacheRows | Yes | Cached data from previous sampler calls |
Outputs
| Name | Type | Description |
|---|---|---|
| data | SampleEntry[] | Array of rows with final schema applied (rollup, metrics aggregated) |
| cacheKey | string or undefined | Cache key (typically not used further since this is the last sampling step) |
Usage Examples
Explicit Schema with Rollup
import { sampleForSchema } from '../utils/sampler';
const spec = {
type: 'index_parallel',
spec: {
dataSchema: {
timestampSpec: { column: 'ts', format: 'iso' },
dimensionsSpec: {
dimensions: [
{ type: 'string', name: 'country' },
{ type: 'string', name: 'city' },
],
},
metricsSpec: [
{ type: 'count', name: 'event_count' },
{ type: 'longSum', name: 'total_value', fieldName: 'value' },
],
granularitySpec: {
queryGranularity: 'HOUR',
rollup: true,
},
},
ioConfig: { /* ... */ },
},
};
const result = await sampleForSchema(spec, cachedRows);
// Rows with same country + city + hour are rolled up