Implementation:Apache Druid SampleForParser
| Knowledge Sources | |
|---|---|
| Domains | Data_Ingestion, Data_Parsing |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete sampler API client function for applying input format parsing to raw sample data.
Description
The sampleForParser function sends a sampling request to the Druid Sampler API with the user-configured inputFormat applied. It uses the applyCache() utility to leverage previously cached sample data from the connection step, avoiding redundant reads from the external source. For reindexing mode, it uses a special detection timestamp spec to properly handle existing Druid columns.
Usage
Call this function after source connection succeeds and the user has selected or confirmed an input format. The function returns parsed data that shows column names and typed values for user verification.
Code Reference
Source Location
- Repository: Apache Druid
- File: web-console/src/utils/sampler.ts
- Lines: L351-L384
Signature
export async function sampleForParser(
spec: Partial<IngestionSpec>,
sampleStrategy: SampleStrategy,
): Promise<SampleResponse>
Import
import { sampleForParser } from '../utils/sampler';
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| spec | Partial<IngestionSpec> | Yes | Ingestion spec with ioConfig.inputFormat configured (json, csv, tsv, parquet, etc.) |
| sampleStrategy | SampleStrategy | Yes | Sampling strategy (typically reuses cached data) |
Outputs
| Name | Type | Description |
|---|---|---|
| data | SampleEntry[] | Array of parsed rows with column names and values |
| cacheKey | string or undefined | Updated cache key for subsequent sampler calls |
Usage Examples
Parsing JSON Data
import { sampleForParser } from '../utils/sampler';
const spec = {
type: 'index_parallel',
spec: {
ioConfig: {
inputSource: { type: 's3', uris: ['s3://bucket/data.json'] },
inputFormat: { type: 'json' },
},
},
};
const result = await sampleForParser(spec, 'start');
// result.data[0].parsed = { timestamp: '2024-01-01', user: 'alice', count: 42 }
Parsing CSV with Header
const csvSpec = {
type: 'index_parallel',
spec: {
ioConfig: {
inputSource: { type: 'http', uris: ['https://example.com/data.csv'] },
inputFormat: {
type: 'csv',
findColumnsFromHeader: true,
skipHeaderRows: 0,
},
},
},
};
const result = await sampleForParser(csvSpec, 'start');