Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Apache Beam Pipeline Configuration Twister2

From Leeroopedia


Attribute Value
Principle Name Pipeline Configuration Twister2
Domain Configuration_Management, HPC
Description Configuration mechanism for specifying Twister2-specific pipeline execution parameters and runner registration through Java service loading
Deprecation Notice The Twister2 Runner is deprecated and scheduled for removal in Apache Beam 3.0
last_updated 2026-02-09 04:00 GMT

Overview

Pipeline Configuration Twister2 defines the configuration mechanism for specifying Twister2-specific pipeline execution parameters and runner registration through Java's service loading infrastructure. This principle governs how the Twister2 runner is discovered, registered, and configured by the Beam SDK before pipeline execution on HPC clusters.

Note: The Twister2 Runner is deprecated and is scheduled for removal in Apache Beam 3.0. Users should plan migration to an actively maintained runner.

Description

Twister2 pipeline configuration defines runner-specific parameters such as parallelism, cluster type, Twister2 home directory, worker resource allocation, and job packaging format. The configuration is expressed as a Java interface (Twister2PipelineOptions) that extends the standard Beam PipelineOptions, StreamingOptions, and FileStagingOptions interfaces.

The runner registrar uses Java's AutoService mechanism to make Twister2Runner discoverable by PipelineOptionsFactory. This enables users to simply set --runner=Twister2Runner in their options without requiring explicit runner imports. The Twister2RunnerRegistrar class contains two inner classes annotated with @AutoService:

  • Runner registrar -- registers Twister2Runner and Twister2TestRunner as available pipeline runners
  • Options registrar -- registers Twister2PipelineOptions as a recognized options interface

The key configuration parameters include:

Parameter Default Description
parallelism 1 Number of parallel workers for Twister2 execution
clusterType "standalone" Cluster deployment type: standalone, nomad, kubernetes, or mesos
twister2Home (none) Path to the Twister2 installation directory; if empty, local mode is used
workerCPUs 2 Number of CPU cores allocated per worker
ramMegaBytes 2048 RAM in megabytes allocated per worker
jobType "java_zip" Job packaging format (jar or java_zip)
tSetEnvironment (none) The Twister2 TSetEnvironment for execution (set internally)

Usage

This configuration mechanism is required whenever executing Beam pipelines on Twister2 HPC clusters. Users configure the cluster type (standalone, nomad, kubernetes, mesos), set the desired parallelism, and specify the Twister2 installation path. If twister2Home is not set or is empty, the runner defaults to local mode with a single worker.

Example usage via command line:

--runner=Twister2Runner \
--parallelism=4 \
--clusterType=standalone \
--twister2Home=/opt/twister2 \
--workerCPUs=4 \
--ramMegaBytes=4096

Example programmatic configuration:

Twister2PipelineOptions options = PipelineOptionsFactory.as(Twister2PipelineOptions.class);
options.setParallelism(4);
options.setClusterType("standalone");
options.setTwister2Home("/opt/twister2");
options.setWorkerCPUs(4);
options.setRamMegaBytes(4096);

Pipeline pipeline = Pipeline.create(options);

The @AutoService annotation on the registrar classes causes the Twister2Runner and Twister2PipelineOptions to be placed into META-INF/services resource files during compilation. These are picked up automatically by Beam's PipelineOptionsFactory at runtime.

Theoretical Basis

This principle is based on the Service Provider Interface (SPI) pattern in Java. Runner implementations register themselves via @AutoService annotations, enabling runtime discovery without compile-time dependencies. The SPI pattern decouples the client code (which simply specifies --runner=Twister2Runner) from the runner implementation, following the Dependency Inversion Principle.

The configuration interface itself follows the Builder Pattern via Beam's PipelineOptions framework, where getter/setter pairs define named properties with optional defaults. The @Default annotations provide sensible defaults, while the @Description annotations serve as self-documenting metadata.

The separation of runner registration into a dedicated registrar class (rather than self-registration within the runner) follows the Single Responsibility Principle, keeping service discovery logic isolated from pipeline execution logic.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment