Implementation:Apache Druid SampleForConnect
| Knowledge Sources | |
|---|---|
| Domains | Data_Ingestion, Data_Sampling |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Concrete sampler API client function for establishing source connectivity and retrieving initial sample data from external data sources.
Description
The sampleForConnect function sends a sampling request to the Druid Sampler API endpoint (POST /druid/indexer/v1/sampler). It constructs a SampleSpec from the current ingestion spec, configures the appropriate input format based on source type (Kafka uses a special Kafka-wrapped format, Kinesis uses a Kinesis-wrapped format), and handles Druid reindexing mode by additionally querying column metadata and aggregator information.
Usage
Call this function after the user has configured the input source connection parameters (e.g., S3 bucket URI, Kafka bootstrap servers, local file path). It is the first API call in both the batch and streaming ingestion wizard flows.
Code Reference
Source Location
- Repository: Apache Druid
- File: web-console/src/utils/sampler.ts
- Lines: L265-L349
Signature
export async function sampleForConnect(
spec: Partial<IngestionSpec>,
sampleStrategy: SampleStrategy,
): Promise<SampleResponseWithExtraInfo>
Import
import { sampleForConnect } from '../utils/sampler';
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| spec | Partial<IngestionSpec> | Yes | Ingestion spec with ioConfig.inputSource populated (S3 URIs, Kafka config, etc.) |
| sampleStrategy | SampleStrategy | Yes | Sampling approach configuration |
Outputs
| Name | Type | Description |
|---|---|---|
| data | SampleEntry[] | Array of sampled raw data rows |
| cacheKey | string or undefined | Cache key for reusing sample data in subsequent sampler calls |
| columns | string[] (optional) | Column names (only for Druid reindexing sources) |
| rollup | boolean (optional) | Whether source uses rollup (only for reindexing) |
| columnInfo | Record (optional) | Column type metadata (only for reindexing) |
| aggregators | Record (optional) | Aggregator definitions (only for reindexing) |
Usage Examples
Basic S3 Connection
import { sampleForConnect } from '../utils/sampler';
const spec = {
type: 'index_parallel',
spec: {
ioConfig: {
inputSource: {
type: 's3',
uris: ['s3://my-bucket/data/events.json'],
},
},
},
};
const result = await sampleForConnect(spec, 'start');
// result.data contains raw sample rows
// result.cacheKey can be passed to subsequent sampler calls
Kafka Streaming Connection
const kafkaSpec = {
type: 'kafka',
spec: {
ioConfig: {
consumerProperties: {
'bootstrap.servers': 'kafka-broker:9092',
},
topic: 'my-events-topic',
},
},
};
const result = await sampleForConnect(kafkaSpec, 'start');
// Kafka sources use a special KAFKA_SAMPLE_INPUT_FORMAT wrapper