Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Apache Flink File Source Builder Configuration

From Leeroopedia


Knowledge Sources
Domains Stream_Processing, File_IO
Last Updated 2026-02-09 00:00 GMT

Overview

A builder-based configuration pattern that constructs file source connectors by specifying input paths, format readers, and operational modes (bounded or continuous).

Description

The File Source Builder Configuration principle enables construction of file-based data sources through the FLIP-27 source framework. It separates the concern of what format to read (stream-wise or bulk) from where to read and how to discover files. The builder selects appropriate defaults for file enumeration and split assignment based on the formats splittability.

Two format types are supported:

  • Stream Format: Records read one at a time (e.g., line-by-line text reading)
  • Bulk Format: Records read in batches (e.g., Parquet row groups)

The builder also supports two operational modes:

  • Bounded: Process a static set of files once (processStaticFileSet)
  • Continuous: Periodically monitor for new files (monitorContinuously)

Usage

Use this principle when building a Flink pipeline that reads from files. Choose stream format for text-based inputs and bulk format for columnar/binary formats. Use continuous mode for streaming ingestion from growing directories.

Theoretical Basis

// Abstract algorithm
1. Select format type (stream or bulk) and specify input paths
2. System auto-selects enumerator based on format splittability:
   - Splittable format -> BlockSplittingRecursiveEnumerator
   - Non-splittable format -> NonSplittingRecursiveEnumerator
3. System defaults to LocalityAwareSplitAssigner
4. Optionally configure continuous monitoring interval
5. Build the immutable FileSource instance

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment