Principle:Apache Beam Pipeline Configuration Twister2
| Attribute | Value |
|---|---|
| Principle Name | Pipeline Configuration Twister2 |
| Domain | Configuration_Management, HPC |
| Description | Configuration mechanism for specifying Twister2-specific pipeline execution parameters and runner registration through Java service loading |
| Deprecation Notice | The Twister2 Runner is deprecated and scheduled for removal in Apache Beam 3.0 |
| last_updated | 2026-02-09 04:00 GMT |
Overview
Pipeline Configuration Twister2 defines the configuration mechanism for specifying Twister2-specific pipeline execution parameters and runner registration through Java's service loading infrastructure. This principle governs how the Twister2 runner is discovered, registered, and configured by the Beam SDK before pipeline execution on HPC clusters.
Note: The Twister2 Runner is deprecated and is scheduled for removal in Apache Beam 3.0. Users should plan migration to an actively maintained runner.
Description
Twister2 pipeline configuration defines runner-specific parameters such as parallelism, cluster type, Twister2 home directory, worker resource allocation, and job packaging format. The configuration is expressed as a Java interface (Twister2PipelineOptions) that extends the standard Beam PipelineOptions, StreamingOptions, and FileStagingOptions interfaces.
The runner registrar uses Java's AutoService mechanism to make Twister2Runner discoverable by PipelineOptionsFactory. This enables users to simply set --runner=Twister2Runner in their options without requiring explicit runner imports. The Twister2RunnerRegistrar class contains two inner classes annotated with @AutoService:
- Runner registrar -- registers
Twister2RunnerandTwister2TestRunneras available pipeline runners - Options registrar -- registers
Twister2PipelineOptionsas a recognized options interface
The key configuration parameters include:
| Parameter | Default | Description |
|---|---|---|
parallelism |
1 | Number of parallel workers for Twister2 execution |
clusterType |
"standalone" | Cluster deployment type: standalone, nomad, kubernetes, or mesos |
twister2Home |
(none) | Path to the Twister2 installation directory; if empty, local mode is used |
workerCPUs |
2 | Number of CPU cores allocated per worker |
ramMegaBytes |
2048 | RAM in megabytes allocated per worker |
jobType |
"java_zip" | Job packaging format (jar or java_zip) |
tSetEnvironment |
(none) | The Twister2 TSetEnvironment for execution (set internally) |
Usage
This configuration mechanism is required whenever executing Beam pipelines on Twister2 HPC clusters. Users configure the cluster type (standalone, nomad, kubernetes, mesos), set the desired parallelism, and specify the Twister2 installation path. If twister2Home is not set or is empty, the runner defaults to local mode with a single worker.
Example usage via command line:
--runner=Twister2Runner \
--parallelism=4 \
--clusterType=standalone \
--twister2Home=/opt/twister2 \
--workerCPUs=4 \
--ramMegaBytes=4096
Example programmatic configuration:
Twister2PipelineOptions options = PipelineOptionsFactory.as(Twister2PipelineOptions.class);
options.setParallelism(4);
options.setClusterType("standalone");
options.setTwister2Home("/opt/twister2");
options.setWorkerCPUs(4);
options.setRamMegaBytes(4096);
Pipeline pipeline = Pipeline.create(options);
The @AutoService annotation on the registrar classes causes the Twister2Runner and Twister2PipelineOptions to be placed into META-INF/services resource files during compilation. These are picked up automatically by Beam's PipelineOptionsFactory at runtime.
Theoretical Basis
This principle is based on the Service Provider Interface (SPI) pattern in Java. Runner implementations register themselves via @AutoService annotations, enabling runtime discovery without compile-time dependencies. The SPI pattern decouples the client code (which simply specifies --runner=Twister2Runner) from the runner implementation, following the Dependency Inversion Principle.
The configuration interface itself follows the Builder Pattern via Beam's PipelineOptions framework, where getter/setter pairs define named properties with optional defaults. The @Default annotations provide sensible defaults, while the @Description annotations serve as self-documenting metadata.
The separation of runner registration into a dedicated registrar class (rather than self-registration within the runner) follows the Single Responsibility Principle, keeping service discovery logic isolated from pipeline execution logic.
Related Pages
- Implementation:Apache_Beam_Twister2PipelineOptions -- Concrete configuration interface and auto-service registrar for Twister2 pipeline options
- Principle:Apache_Beam_Transform_Override_Application_Twister2 -- Transform override mechanism used after configuration
- Principle:Apache_Beam_Job_Submission_Twister2 -- Job submission that consumes the configured options