Principle:Apache Flink File Source Builder Configuration
| Knowledge Sources | |
|---|---|
| Domains | Stream_Processing, File_IO |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A builder-based configuration pattern that constructs file source connectors by specifying input paths, format readers, and operational modes (bounded or continuous).
Description
The File Source Builder Configuration principle enables construction of file-based data sources through the FLIP-27 source framework. It separates the concern of what format to read (stream-wise or bulk) from where to read and how to discover files. The builder selects appropriate defaults for file enumeration and split assignment based on the formats splittability.
Two format types are supported:
- Stream Format: Records read one at a time (e.g., line-by-line text reading)
- Bulk Format: Records read in batches (e.g., Parquet row groups)
The builder also supports two operational modes:
- Bounded: Process a static set of files once (processStaticFileSet)
- Continuous: Periodically monitor for new files (monitorContinuously)
Usage
Use this principle when building a Flink pipeline that reads from files. Choose stream format for text-based inputs and bulk format for columnar/binary formats. Use continuous mode for streaming ingestion from growing directories.
Theoretical Basis
// Abstract algorithm
1. Select format type (stream or bulk) and specify input paths
2. System auto-selects enumerator based on format splittability:
- Splittable format -> BlockSplittingRecursiveEnumerator
- Non-splittable format -> NonSplittingRecursiveEnumerator
3. System defaults to LocalityAwareSplitAssigner
4. Optionally configure continuous monitoring interval
5. Build the immutable FileSource instance