Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Apache Beam Bundle

From Leeroopedia


Knowledge Sources
Domains Data_Processing, Runner_Infrastructure
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete interface for representing an immutable collection of windowed elements belonging to a PCollection within the Apache Beam local runner.

Description

The Bundle interface defines the contract for an immutable, iterable collection of WindowedValue elements that are part of a single PCollection. It serves as the unit of data transfer between pipeline stages in the local runner, carrying not only elements but also watermark and key metadata. Each bundle tracks its associated PCollection, the structural key from the most recent GroupByKey, the minimum element timestamp, and the synchronized processing output watermark.

Usage

Use this interface when implementing or interacting with the local runner's internal data transport layer. It is the abstraction consumed by BundleFactory, TransformEvaluator, and ExecutorServiceParallelExecutor to pass data between pipeline stages. Not typically imported by end users; it is a runner-internal API.

Code Reference

Source Location

Signature

public interface Bundle<T, CollectionT> extends Iterable<WindowedValue<T>> {

    /** Returns the PCollection that the elements of this bundle belong to. */
    @Nullable
    CollectionT getPCollection();

    /**
     * Returns the key that was output in the most recent GroupByKey
     * in the execution of this bundle.
     */
    StructuralKey<?> getKey();

    /**
     * Return the minimum timestamp among elements in this bundle.
     */
    Instant getMinimumTimestamp();

    /**
     * Returns the processing time output watermark at the time the
     * producing Executable committed this bundle.
     */
    Instant getSynchronizedProcessingOutputWatermark();
}

Import

import org.apache.beam.runners.local.Bundle;

I/O Contract

Inputs

Name Type Required Description
T Type Parameter Yes The element type contained in the bundle
CollectionT Type Parameter Yes The PCollection type this bundle belongs to

Outputs

Name Type Description
getPCollection() CollectionT (nullable) The PCollection that owns these elements
getKey() StructuralKey<?> The structural key from the most recent GroupByKey
getMinimumTimestamp() Instant The earliest timestamp among all elements in the bundle
getSynchronizedProcessingOutputWatermark() Instant The processing time output watermark when the bundle was committed
iterator() Iterator<WindowedValue<T>> Inherited from Iterable; iterates over windowed elements

Usage Examples

Consuming a Bundle in a Transform Evaluator

import org.apache.beam.runners.local.Bundle;
import org.apache.beam.sdk.values.WindowedValue;
import org.joda.time.Instant;

// Within a TransformEvaluator processing a bundle:
public <T> void processBundle(Bundle<T, ?> inputBundle) {
    // Access the structural key for GroupByKey context
    StructuralKey<?> key = inputBundle.getKey();

    // Get watermark information for downstream scheduling
    Instant minTimestamp = inputBundle.getMinimumTimestamp();
    Instant watermark = inputBundle.getSynchronizedProcessingOutputWatermark();

    // Iterate over all windowed elements in the bundle
    for (WindowedValue<T> element : inputBundle) {
        // Process each element with its windows and timestamp
        processElement(element);
    }
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment