Principle:Neuml Txtai Workflow Composition
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Workflow, Pipeline |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Workflow Composition is the principle of assembling an ordered sequence of Task objects into a single, deterministic processing pipeline called a Workflow. A Workflow defines the complete data transformation chain: it specifies which tasks run, in what order, how data is batched, and how many concurrent workers are available. Once composed, a Workflow is itself a callable object that accepts input data and produces transformed output, enabling it to be nested inside other workflows or used as an action in a Task.
Description
The Workflow constructor takes a list of Task instances and configuration parameters that govern execution behavior. The key design decisions made at composition time are:
Task Ordering: Tasks are stored as an ordered list. During execution, each batch of data passes through tasks sequentially -- the output of task N becomes the input of task N+1. This creates a deterministic, linear processing pipeline where the developer has full control over the transformation sequence.
Batch Size: The batch parameter (default 100) controls how many data elements are grouped together for processing. Batching is critical for performance with GPU-backed pipelines, where processing many elements at once is significantly faster than processing them individually. The batch size also controls memory consumption -- larger batches use more memory but fewer processing rounds.
Worker Count: The workers parameter sets the number of concurrent workers available to the executor. If not explicitly set, it defaults to the maximum number of actions across all tasks in the workflow. This ensures that multi-action tasks can run their actions in parallel.
Naming: The optional name parameter provides an identifier for logging, scheduling, and programmatic access when workflows are managed by an Application.
Stream Processing: The optional stream parameter accepts a callable that pre-processes the input elements before they enter the task chain. This enables input transformation, filtering, or augmentation at the workflow boundary.
Usage
Use Workflow Composition when you need to:
- Chain multiple NLP tasks into a single, reusable processing pipeline (e.g., extract text, then summarize, then translate).
- Control batch processing to optimize throughput and memory usage for GPU-backed models.
- Enable concurrent execution of multi-action tasks through the worker pool.
- Create named workflows that can be referenced and executed programmatically via an Application instance.
- Pre-process input data with a stream callable before it enters the task chain.
Theoretical Basis
Workflow Composition implements the Pipeline Pattern (also known as Pipes and Filters), a classic architectural pattern where data flows through a sequence of processing stages. Each stage (Task) performs a transformation and passes its output to the next stage. This pattern provides:
- Separation of concerns: Each task encapsulates a single processing step.
- Reusability: Tasks can be reused across multiple workflows.
- Testability: Each task can be tested independently.
- Composability: Workflows can be nested since they are themselves callable.
The batching mechanism draws from mini-batch processing in machine learning, where grouping inputs into fixed-size batches balances throughput (GPU utilization) against latency (time to first result). The configurable batch size allows tuning this trade-off for different deployment scenarios.
The worker pool follows the Thread Pool Executor pattern, providing bounded concurrency for multi-action tasks without requiring the developer to manage threads directly. The default worker count (maximum actions across tasks) ensures that no task is starved of parallelism while preventing excessive thread creation.