Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Dagster io Dagster External Compute Orchestration

From Leeroopedia


Field Value
Principle Name External Compute Orchestration
Category Data Orchestration
Domains Data_Engineering, Serverless, GPU_Computing
Repository dagster-io/dagster

Overview

Strategy for orchestrating computation in external processes (serverless functions, GPU workers, containers) while maintaining asset metadata reporting back to Dagster.

Description

External compute orchestration allows Dagster to launch and monitor computations running outside the Dagster process. The Dagster Pipes protocol enables external processes to report asset materializations, metadata, and logs back to Dagster. This is essential for GPU workloads (Modal, SageMaker), containerized jobs (Docker, K8s), and serverless functions (AWS Lambda) where the compute environment is separate from Dagster's orchestrator.

The protocol defines a two-way communication channel:

  • The orchestrator (Dagster) launches the external process and passes context (asset keys, partition keys, extras) via a context injector.
  • The external process receives context, performs computation, and reports results (materializations, metadata, logs) back via a message reader.
  • The transport layer (stdout, file, cloud storage) carries messages between the two sides, decoupling them from needing direct network connectivity.

Usage

Use when computation must run in an external environment (GPU clusters, serverless platforms, containers) but you need the results tracked in Dagster's asset graph with proper metadata, lineage, and observability. Common scenarios include:

  • GPU workloads -- Machine learning training or inference on Modal, SageMaker, or dedicated GPU servers
  • Containerized jobs -- Docker or Kubernetes jobs that run in isolated environments
  • Serverless functions -- AWS Lambda, Google Cloud Functions, or Azure Functions triggered by Dagster
  • Legacy systems -- Existing scripts or processes that cannot be modified to import Dagster directly

Theoretical Basis

Pipes implements the remote procedure call (RPC) pattern adapted for data orchestration. The protocol defines a message format (materialization events, metadata, logs) that flows from the external process back to Dagster through a transport layer (stdout, file, cloud storage). This decouples the execution environment from the orchestration plane, following the sidecar pattern common in microservices architectures.

Key theoretical properties:

  • Separation of concerns -- The orchestration plane (scheduling, dependency management, observability) is decoupled from the execution plane (compute, data processing).
  • Transport agnosticism -- The protocol is independent of the transport mechanism, allowing communication over stdout, files, S3, or any custom channel.
  • Minimal external dependency -- The external process only needs the lightweight dagster-pipes package, not the full Dagster framework.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment