Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Druid SampleForConnect

From Leeroopedia


Knowledge Sources
Domains Data_Ingestion, Data_Sampling
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete sampler API client function for establishing source connectivity and retrieving initial sample data from external data sources.

Description

The sampleForConnect function sends a sampling request to the Druid Sampler API endpoint (POST /druid/indexer/v1/sampler). It constructs a SampleSpec from the current ingestion spec, configures the appropriate input format based on source type (Kafka uses a special Kafka-wrapped format, Kinesis uses a Kinesis-wrapped format), and handles Druid reindexing mode by additionally querying column metadata and aggregator information.

Usage

Call this function after the user has configured the input source connection parameters (e.g., S3 bucket URI, Kafka bootstrap servers, local file path). It is the first API call in both the batch and streaming ingestion wizard flows.

Code Reference

Source Location

  • Repository: Apache Druid
  • File: web-console/src/utils/sampler.ts
  • Lines: L265-L349

Signature

export async function sampleForConnect(
  spec: Partial<IngestionSpec>,
  sampleStrategy: SampleStrategy,
): Promise<SampleResponseWithExtraInfo>

Import

import { sampleForConnect } from '../utils/sampler';

I/O Contract

Inputs

Name Type Required Description
spec Partial<IngestionSpec> Yes Ingestion spec with ioConfig.inputSource populated (S3 URIs, Kafka config, etc.)
sampleStrategy SampleStrategy Yes Sampling approach configuration

Outputs

Name Type Description
data SampleEntry[] Array of sampled raw data rows
cacheKey string or undefined Cache key for reusing sample data in subsequent sampler calls
columns string[] (optional) Column names (only for Druid reindexing sources)
rollup boolean (optional) Whether source uses rollup (only for reindexing)
columnInfo Record (optional) Column type metadata (only for reindexing)
aggregators Record (optional) Aggregator definitions (only for reindexing)

Usage Examples

Basic S3 Connection

import { sampleForConnect } from '../utils/sampler';

const spec = {
  type: 'index_parallel',
  spec: {
    ioConfig: {
      inputSource: {
        type: 's3',
        uris: ['s3://my-bucket/data/events.json'],
      },
    },
  },
};

const result = await sampleForConnect(spec, 'start');
// result.data contains raw sample rows
// result.cacheKey can be passed to subsequent sampler calls

Kafka Streaming Connection

const kafkaSpec = {
  type: 'kafka',
  spec: {
    ioConfig: {
      consumerProperties: {
        'bootstrap.servers': 'kafka-broker:9092',
      },
      topic: 'my-events-topic',
    },
  },
};

const result = await sampleForConnect(kafkaSpec, 'start');
// Kafka sources use a special KAFKA_SAMPLE_INPUT_FORMAT wrapper

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment