Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Dagster io Dagster Software Defined Assets

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Orchestration
Last Updated 2026-02-10 00:00 GMT

Overview

Software-defined assets are the core abstraction in Dagster, representing data artifacts (tables, files, ML models) as first-class objects with known dependencies, computation functions, and metadata.

Description

Software-defined assets combine an asset key (identity), a computation function, and upstream dependencies into a single declarative unit. Unlike traditional task-based orchestration, assets declare what data they produce rather than what steps to run. Dependencies between assets are inferred from function parameters, enabling the framework to automatically construct the execution graph.

Each software-defined asset encapsulates three core elements:

  • Asset Key: A unique identifier for the data artifact (e.g., a database table name, file path, or logical data product name).
  • Computation Function: The Python function that produces or updates the asset when executed.
  • Upstream Dependencies: References to other assets that must be materialized before this asset can be computed.

This combination allows Dagster to provide automatic lineage tracking, incremental computation, and declarative automation without requiring users to manually wire together execution steps.

Usage

Use software-defined assets when modeling any data pipeline where outputs have meaningful identity. This includes database tables, files in object storage, ML models, feature tables, and any other data artifact that is produced by computation and consumed by downstream processes. Software-defined assets are the fundamental building block of all Dagster pipelines and should be the default choice for representing data transformations.

Theoretical Basis

The asset-centric model inverts the traditional DAG-of-tasks paradigm. In a task-based system, users define a directed acyclic graph of operations (Extract, Transform, Load) and wire them together explicitly. In an asset-centric system, users define data products and their dependencies, and the orchestrator infers the execution plan from the declared asset graph.

This inversion provides several theoretical advantages:

  • Declarative Semantics: The pipeline specification describes the desired state of data rather than the procedure to achieve it.
  • Automatic Lineage: Because dependencies are declared at the data level, the system can trace the full provenance of any asset.
  • Incremental Computation: The framework can determine which assets need re-computation based on upstream changes, avoiding unnecessary work.
  • Idempotency: Asset materializations are designed to be idempotent -- re-running the same asset with the same inputs produces the same output.

The following pseudocode illustrates the conceptual model:

# Traditional task-based approach
task_extract >> task_transform >> task_load

# Asset-centric approach
asset("raw_data")           # declares what is produced
asset("clean_data",         # declares dependency on raw_data
      deps=["raw_data"])
asset("summary",            # declares dependency on clean_data
      deps=["clean_data"])
# Execution order is inferred automatically

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment