Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Heibaiying BigData Notes DataStream Transformations

From Leeroopedia


Knowledge Sources
Domains Stream_Processing, Big_Data
Last Updated 2026-02-10 10:00 GMT

Overview

Concrete tools for applying transformations to Flink DataStreams provided by the Apache Flink DataStream API.

Description

The Flink DataStream API provides a set of transformation methods on DataStream objects that enable developers to express processing logic declaratively. Each transformation accepts a user-defined function (UDF) and returns a new DataStream with the transformation applied. The primary transformations are:

  • map -- Takes a MapFunction<T, R> and produces a DataStream<R> with exactly one output per input.
  • flatMap -- Takes a FlatMapFunction<T, R> and produces a DataStream<R> with zero or more outputs per input.
  • filter -- Takes a FilterFunction<T> and produces a DataStream<T> containing only elements that satisfy the predicate.
  • keyBy -- Takes a key selector (field index, field name, or KeySelector function) and produces a KeyedStream partitioned by key.
  • window -- Applied on a KeyedStream, takes a WindowAssigner and produces a WindowedStream for windowed aggregations.

In the BigData-Notes repository, the KafkaStreamingJob demonstrates a map transformation on a Kafka-sourced stream, and the Flink_Data_Transformation notes provide comprehensive coverage of all available transformations with examples.

Usage

Chain these transformation methods on a DataStream object to build the processing pipeline. Transformations are applied lazily and only execute when env.execute() is called. Use lambda expressions or anonymous inner classes to define the transformation logic.

Code Reference

Source Location

  • Repository: BigData-Notes
  • File: code/Flink/flink-kafka-integration/src/main/java/com/heibaiying/KafkaStreamingJob.java (line 42, map example)
  • File: notes/Flink_Data_Transformation.md (lines 1-314, comprehensive transformation reference)

Signature

// map transformation
public <R> SingleOutputStreamOperator<R> map(MapFunction<T, R> mapper)

// flatMap transformation
public <R> SingleOutputStreamOperator<R> flatMap(FlatMapFunction<T, R> flatMapper)

// filter transformation
public SingleOutputStreamOperator<T> filter(FilterFunction<T> filter)

// keyBy transformation (by field index)
public KeyedStream<T, Tuple> keyBy(int... fields)

// keyBy transformation (by KeySelector)
public <K> KeyedStream<T, K> keyBy(KeySelector<T, K> key)

// window transformation (on KeyedStream)
public <W extends Window> WindowedStream<T, K, W> window(WindowAssigner<? super T, W> assigner)

Import

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;

I/O Contract

Inputs

Name Type Required Description
mapper MapFunction<T, R> Yes (for map) A function that takes one element of type T and returns one element of type R.
flatMapper FlatMapFunction<T, R> Yes (for flatMap) A function that takes one element of type T and emits zero or more elements of type R via a Collector.
filter FilterFunction<T> Yes (for filter) A boolean predicate function; elements returning true are retained.
key int or KeySelector<T, K> Yes (for keyBy) A field index or key extraction function that determines the grouping key.
assigner WindowAssigner<? super T, W> Yes (for window) Defines the windowing strategy (tumbling, sliding, session, or global windows).

Outputs

Name Type Description
result (map) DataStream<R> A new stream where each element is the result of applying the map function to the corresponding input element.
result (flatMap) DataStream<R> A new stream containing all elements emitted by the flatMap function across all input elements.
result (filter) DataStream<T> A new stream containing only the input elements that satisfied the filter predicate.
result (keyBy) KeyedStream<T, K> A logically partitioned stream where all elements with the same key are directed to the same operator subtask.
result (window) WindowedStream<T, K, W> A windowed stream ready for aggregation operations (reduce, aggregate, apply).

Usage Examples

Basic Usage

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;

// Map: convert each line to uppercase
DataStream<String> upperCased = stream.map(line -> line.toUpperCase());

// FlatMap: split each line into words and emit (word, 1) tuples
DataStream<Tuple2<String, Integer>> wordCounts = stream
    .flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
        @Override
        public void flatMap(String line, Collector<Tuple2<String, Integer>> out) {
            for (String word : line.split("\\s+")) {
                out.collect(new Tuple2<>(word, 1));
            }
        }
    });

// Filter: retain only non-empty strings
DataStream<String> nonEmpty = stream.filter(line -> !line.isEmpty());

// KeyBy + Window + Sum: word count over 5-second tumbling windows
DataStream<Tuple2<String, Integer>> windowedCounts = wordCounts
    .keyBy(0)
    .window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
    .sum(1);

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment