Implementation:Apache Druid SampleForConnect

Knowledge Sources	Apache Druid Druid Sampler API
Domains	Data_Ingestion, Data_Sampling
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete sampler API client function for establishing source connectivity and retrieving initial sample data from external data sources.

Description

The sampleForConnect function sends a sampling request to the Druid Sampler API endpoint (POST /druid/indexer/v1/sampler). It constructs a SampleSpec from the current ingestion spec, configures the appropriate input format based on source type (Kafka uses a special Kafka-wrapped format, Kinesis uses a Kinesis-wrapped format), and handles Druid reindexing mode by additionally querying column metadata and aggregator information.

Usage

Call this function after the user has configured the input source connection parameters (e.g., S3 bucket URI, Kafka bootstrap servers, local file path). It is the first API call in both the batch and streaming ingestion wizard flows.

Code Reference

Source Location

Repository: Apache Druid
File: web-console/src/utils/sampler.ts
Lines: L265-L349

Signature

export async function sampleForConnect(
  spec: Partial<IngestionSpec>,
  sampleStrategy: SampleStrategy,
): Promise<SampleResponseWithExtraInfo>

Import

import { sampleForConnect } from '../utils/sampler';

I/O Contract

Inputs

Name	Type	Required	Description
spec	Partial<IngestionSpec>	Yes	Ingestion spec with ioConfig.inputSource populated (S3 URIs, Kafka config, etc.)
sampleStrategy	SampleStrategy	Yes	Sampling approach configuration

Outputs

Name	Type	Description
data	SampleEntry[]	Array of sampled raw data rows
cacheKey	string or undefined	Cache key for reusing sample data in subsequent sampler calls
columns	string[] (optional)	Column names (only for Druid reindexing sources)
rollup	boolean (optional)	Whether source uses rollup (only for reindexing)
columnInfo	Record (optional)	Column type metadata (only for reindexing)
aggregators	Record (optional)	Aggregator definitions (only for reindexing)

Usage Examples

Basic S3 Connection

import { sampleForConnect } from '../utils/sampler';

const spec = {
  type: 'index_parallel',
  spec: {
    ioConfig: {
      inputSource: {
        type: 's3',
        uris: ['s3://my-bucket/data/events.json'],
      },
    },
  },
};

const result = await sampleForConnect(spec, 'start');
// result.data contains raw sample rows
// result.cacheKey can be passed to subsequent sampler calls

Kafka Streaming Connection

const kafkaSpec = {
  type: 'kafka',
  spec: {
    ioConfig: {
      consumerProperties: {
        'bootstrap.servers': 'kafka-broker:9092',
      },
      topic: 'my-events-topic',
    },
  },
};

const result = await sampleForConnect(kafkaSpec, 'start');
// Kafka sources use a special KAFKA_SAMPLE_INPUT_FORMAT wrapper

Related Pages

Implements Principle

Principle:Apache_Druid_Source_Connection

Requires Environment

Uses Heuristic

Heuristic:Apache_Druid_Sampler_Limitations_And_Workarounds

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment