Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Flink GeneratorFunction

From Leeroopedia


Knowledge Sources
Domains Connectors, Source_Framework
Last Updated 2026-02-09 00:00 GMT

Overview

A functional interface for data generator functions that transform input elements (typically Long indices) into output records, serving as the core building block of the DataGeneratorSource.

Description

GeneratorFunction is an experimental interface that extends Flink's Function base interface. It defines the contract for element-wise transformations used by the DataGeneratorSource to drive data generation. The source supplies sequential Long index values, and the generator function maps each index to a concrete output element.

The interface provides three methods:

  • open(SourceReaderContext): An optional initialization method called once before data generation begins. Implementations can use this to access reader context information such as subtask index and parallelism.
  • map(T value): The core transformation method that converts an input value into an output record. This is the only abstract method, making the interface a functional interface suitable for lambda expressions.
  • close(): An optional tear-down method for resource cleanup.

Key design aspects:

  • As a @FunctionalInterface-compatible interface (single abstract method), it supports concise lambda syntax for simple transformations.
  • The open() method receives a SourceReaderContext, enabling implementations to customize behavior based on subtask identity or parallelism.
  • Extends Function (which extends Serializable), ensuring that implementations can be serialized and shipped to cluster nodes.

Usage

Use GeneratorFunction to define the data transformation logic for a DataGeneratorSource. For simple cases, use a lambda expression. For complex cases requiring initialization or state, create a class that implements this interface and overrides open() and/or close().

Code Reference

Source Location

  • Repository: Apache_Flink
  • File: flink-connectors/flink-connector-datagen/src/main/java/org/apache/flink/connector/datagen/source/GeneratorFunction.java
  • Lines: 1-55

Signature

@Experimental
public interface GeneratorFunction<T, O> extends Function

Import

import org.apache.flink.connector.datagen.source.GeneratorFunction;

I/O Contract

Inputs

Name Type Required Description
value (map input) T Yes The input element to transform, typically a Long index supplied by the DataGeneratorSource
readerContext (open input) SourceReaderContext No Context object providing information about the source reader (subtask index, parallelism, etc.)

Outputs

Name Type Description
map result O The output record produced by transforming the input element

Usage Examples

// Simple lambda usage
GeneratorFunction<Long, String> generatorFunction = index -> "Number: " + index;
DataGeneratorSource<String> source =
        new DataGeneratorSource<>(generatorFunction, 1000, Types.STRING);

// Class-based usage with initialization
public class CustomGenerator implements GeneratorFunction<Long, MyRecord> {
    private int subtaskIndex;

    @Override
    public void open(SourceReaderContext readerContext) {
        this.subtaskIndex = readerContext.getIndexOfSubtask();
    }

    @Override
    public MyRecord map(Long index) {
        return new MyRecord(subtaskIndex, index);
    }
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment