Implementation:Apache Beam Bundle
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Runner_Infrastructure |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete interface for representing an immutable collection of windowed elements belonging to a PCollection within the Apache Beam local runner.
Description
The Bundle interface defines the contract for an immutable, iterable collection of WindowedValue elements that are part of a single PCollection. It serves as the unit of data transfer between pipeline stages in the local runner, carrying not only elements but also watermark and key metadata. Each bundle tracks its associated PCollection, the structural key from the most recent GroupByKey, the minimum element timestamp, and the synchronized processing output watermark.
Usage
Use this interface when implementing or interacting with the local runner's internal data transport layer. It is the abstraction consumed by BundleFactory, TransformEvaluator, and ExecutorServiceParallelExecutor to pass data between pipeline stages. Not typically imported by end users; it is a runner-internal API.
Code Reference
Source Location
- Repository: Apache_Beam
- File: runners/local-java/src/main/java/org/apache/beam/runners/local/Bundle.java
- Lines: 18-54
Signature
public interface Bundle<T, CollectionT> extends Iterable<WindowedValue<T>> {
/** Returns the PCollection that the elements of this bundle belong to. */
@Nullable
CollectionT getPCollection();
/**
* Returns the key that was output in the most recent GroupByKey
* in the execution of this bundle.
*/
StructuralKey<?> getKey();
/**
* Return the minimum timestamp among elements in this bundle.
*/
Instant getMinimumTimestamp();
/**
* Returns the processing time output watermark at the time the
* producing Executable committed this bundle.
*/
Instant getSynchronizedProcessingOutputWatermark();
}
Import
import org.apache.beam.runners.local.Bundle;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| T | Type Parameter | Yes | The element type contained in the bundle |
| CollectionT | Type Parameter | Yes | The PCollection type this bundle belongs to |
Outputs
| Name | Type | Description |
|---|---|---|
| getPCollection() | CollectionT (nullable) | The PCollection that owns these elements |
| getKey() | StructuralKey<?> | The structural key from the most recent GroupByKey |
| getMinimumTimestamp() | Instant | The earliest timestamp among all elements in the bundle |
| getSynchronizedProcessingOutputWatermark() | Instant | The processing time output watermark when the bundle was committed |
| iterator() | Iterator<WindowedValue<T>> | Inherited from Iterable; iterates over windowed elements |
Usage Examples
Consuming a Bundle in a Transform Evaluator
import org.apache.beam.runners.local.Bundle;
import org.apache.beam.sdk.values.WindowedValue;
import org.joda.time.Instant;
// Within a TransformEvaluator processing a bundle:
public <T> void processBundle(Bundle<T, ?> inputBundle) {
// Access the structural key for GroupByKey context
StructuralKey<?> key = inputBundle.getKey();
// Get watermark information for downstream scheduling
Instant minTimestamp = inputBundle.getMinimumTimestamp();
Instant watermark = inputBundle.getSynchronizedProcessingOutputWatermark();
// Iterate over all windowed elements in the bundle
for (WindowedValue<T> element : inputBundle) {
// Process each element with its windows and timestamp
processElement(element);
}
}