Implementation:Apache Druid SampleForFilter
| Knowledge Sources | |
|---|---|
| Domains | Data_Ingestion, Data_Filtering |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete sampler API client function for previewing row-level filter application on sample data during ingestion.
Description
The sampleForFilter function applies user-defined ingestion filters to cached sample data. Similar to sampleForTransform, it performs a two-phase operation when transforms exist: first auto-detecting base dimensions, then combining them with transform columns to build the correct dimensionsSpec before applying the filter. The result shows only rows that pass the filter condition.
Usage
Call this function after data transformation when the user has defined filter conditions. The function returns the filtered subset of sample data for user verification.
Code Reference
Source Location
- Repository: Apache Druid
- File: web-console/src/utils/sampler.ts
- Lines: L540-L607
Signature
export async function sampleForFilter(
spec: Partial<IngestionSpec>,
cacheRows: CacheRows,
): Promise<SampleResponse>
Import
import { sampleForFilter } from '../utils/sampler';
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| spec | Partial<IngestionSpec> | Yes | Ingestion spec with transformSpec.filter configured |
| cacheRows | CacheRows | Yes | Cached data from previous sampler calls |
Outputs
| Name | Type | Description |
|---|---|---|
| data | SampleEntry[] | Array of rows that pass the filter condition |
| cacheKey | string or undefined | Cache key for subsequent calls |
Usage Examples
Filtering by Value
import { sampleForFilter } from '../utils/sampler';
const spec = {
type: 'index_parallel',
spec: {
dataSchema: {
timestampSpec: { column: 'ts', format: 'iso' },
transformSpec: {
transforms: [],
filter: {
type: 'selector',
dimension: 'country',
value: 'US',
},
},
},
ioConfig: { /* ... */ },
},
};
const result = await sampleForFilter(spec, cachedRows);
// Only rows where country = 'US' are returned