Implementation:Apache Flink FileSource ForRecordStreamFormat
| Knowledge Sources | |
|---|---|
| Domains | Stream_Processing, File_IO |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for constructing file source connectors with stream-wise or bulk format readers provided by the Apache Flink connector-files module.
Description
The FileSource class provides two static factory methods. forRecordStreamFormat accepts a StreamFormat<T> and input paths, internally adapting it to a BulkFormat via StreamFormatAdapter. forBulkFileFormat accepts a BulkFormat<T, FileSourceSplit> directly. Both return a FileSourceBuilder that auto-selects the file enumerator based on format splittability and defaults to LocalityAwareSplitAssigner.
Usage
Import these factory methods to create file sources for Flink pipelines. Use forRecordStreamFormat for text files and forBulkFileFormat for Parquet/ORC.
Code Reference
Source Location
- Repository: Apache Flink
- File: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/FileSource.java
- Lines: L181-199
Signature
@PublicEvolving
public final class FileSource<T> extends AbstractFileSource<T, FileSourceSplit>
implements DynamicParallelismInference {
public static <T> FileSourceBuilder<T> forRecordStreamFormat(
final StreamFormat<T> streamFormat, final Path... paths) {
return forBulkFileFormat(new StreamFormatAdapter<>(streamFormat), paths);
}
public static <T> FileSourceBuilder<T> forBulkFileFormat(
final BulkFormat<T, FileSourceSplit> bulkFormat, final Path... paths) {
checkNotNull(bulkFormat, "reader");
checkNotNull(paths, "paths");
checkArgument(paths.length > 0, "paths must not be empty");
return new FileSourceBuilder<>(paths, bulkFormat);
}
}
Import
import org.apache.flink.connector.file.src.FileSource;
import org.apache.flink.connector.file.src.reader.StreamFormat;
import org.apache.flink.connector.file.src.reader.BulkFormat;
import org.apache.flink.core.fs.Path;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| streamFormat | StreamFormat<T> | Yes (stream) | Record-by-record reader |
| bulkFormat | BulkFormat<T, FileSourceSplit> | Yes (bulk) | Batch reader (e.g., Parquet) |
| paths | Path... | Yes | Input file or directory paths |
Outputs
| Name | Type | Description |
|---|---|---|
| builder | FileSourceBuilder<T> | Configurable builder supporting .monitorContinuously(), .processStaticFileSet(), .build() |
| FileSource<T> | FileSource<T> | Final source instance (after .build()) |
Usage Examples
Reading Text Files
import org.apache.flink.connector.file.src.FileSource;
import org.apache.flink.formats.text.TextLineInputFormat;
import org.apache.flink.core.fs.Path;
FileSource<String> source = FileSource
.forRecordStreamFormat(new TextLineInputFormat(), new Path("/input/logs/"))
.monitorContinuously(Duration.ofSeconds(30))
.build();
DataStream<String> stream = env.fromSource(source, WatermarkStrategy.noWatermarks(), "file-source");
Reading Parquet Files
import org.apache.flink.connector.file.src.FileSource;
import org.apache.flink.formats.parquet.avro.AvroParquetReaders;
FileSource<GenericRecord> source = FileSource
.forBulkFileFormat(AvroParquetReaders.forGenericRecord(schema), new Path("/input/parquet/"))
.processStaticFileSet()
.build();