Implementation:Apache Flink GeneratorFunction
| Knowledge Sources | |
|---|---|
| Domains | Connectors, Source_Framework |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A functional interface for data generator functions that transform input elements (typically Long indices) into output records, serving as the core building block of the DataGeneratorSource.
Description
GeneratorFunction is an experimental interface that extends Flink's Function base interface. It defines the contract for element-wise transformations used by the DataGeneratorSource to drive data generation. The source supplies sequential Long index values, and the generator function maps each index to a concrete output element.
The interface provides three methods:
- open(SourceReaderContext): An optional initialization method called once before data generation begins. Implementations can use this to access reader context information such as subtask index and parallelism.
- map(T value): The core transformation method that converts an input value into an output record. This is the only abstract method, making the interface a functional interface suitable for lambda expressions.
- close(): An optional tear-down method for resource cleanup.
Key design aspects:
- As a @FunctionalInterface-compatible interface (single abstract method), it supports concise lambda syntax for simple transformations.
- The open() method receives a SourceReaderContext, enabling implementations to customize behavior based on subtask identity or parallelism.
- Extends Function (which extends Serializable), ensuring that implementations can be serialized and shipped to cluster nodes.
Usage
Use GeneratorFunction to define the data transformation logic for a DataGeneratorSource. For simple cases, use a lambda expression. For complex cases requiring initialization or state, create a class that implements this interface and overrides open() and/or close().
Code Reference
Source Location
- Repository: Apache_Flink
- File:
flink-connectors/flink-connector-datagen/src/main/java/org/apache/flink/connector/datagen/source/GeneratorFunction.java - Lines: 1-55
Signature
@Experimental
public interface GeneratorFunction<T, O> extends Function
Import
import org.apache.flink.connector.datagen.source.GeneratorFunction;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| value (map input) | T | Yes | The input element to transform, typically a Long index supplied by the DataGeneratorSource |
| readerContext (open input) | SourceReaderContext | No | Context object providing information about the source reader (subtask index, parallelism, etc.) |
Outputs
| Name | Type | Description |
|---|---|---|
| map result | O | The output record produced by transforming the input element |
Usage Examples
// Simple lambda usage
GeneratorFunction<Long, String> generatorFunction = index -> "Number: " + index;
DataGeneratorSource<String> source =
new DataGeneratorSource<>(generatorFunction, 1000, Types.STRING);
// Class-based usage with initialization
public class CustomGenerator implements GeneratorFunction<Long, MyRecord> {
private int subtaskIndex;
@Override
public void open(SourceReaderContext readerContext) {
this.subtaskIndex = readerContext.getIndexOfSubtask();
}
@Override
public MyRecord map(Long index) {
return new MyRecord(subtaskIndex, index);
}
}