Workflow:PrefectHQ Prefect Asset Based Data Pipeline

Knowledge Sources	Prefect Prefect Docs Prefect Assets Docs
Domains	Data_Engineering, Asset_Management
Last Updated	2026-02-09 22:00 GMT

Overview

End-to-end process for building declarative, dependency-aware data pipelines using Prefect Assets with the @materialize decorator for automatic dependency tracking, caching, versioning, and lineage observability.

Description

This workflow demonstrates the asset-based approach to building data pipelines with Prefect. Instead of manually managing data dependencies and execution order, assets are defined with unique keys and materialisation functions decorated with @materialize. Dependencies between assets are automatically inferred from function parameters. Prefect handles execution ordering, caching (skip re-computation when upstream data has not changed), versioning, and full lineage tracking. Rich UI artifacts can be created for observability.

Key outputs:

Materialised data assets at each pipeline stage (raw, processed, analytics)
Automatic lineage graph showing asset dependencies
Rich markdown artifacts in the Prefect UI for observability

Scope:

From raw data fetching through processing to analytics generation
Declarative dependency management without manual DAG definition

Usage

Execute this workflow when you want to build a data pipeline with automatic dependency resolution, caching, and lineage tracking. It is suitable for multi-stage data processing pipelines where assets represent intermediate and final data products, and where you want Prefect to manage execution order and skip unnecessary recomputation.

Execution Steps

Step 1: Define Asset Keys

Declare the assets that the pipeline will produce, each with a unique key (e.g., S3 path, pipeline URI, or database identifier). These keys serve as the identity for caching, versioning, and lineage tracking.

Key considerations:

Keys should be descriptive and follow a consistent naming scheme
In production, keys often map to storage locations (S3 paths, database tables)
Assets are first-class objects that can be referenced across flows

Step 2: Fetch Raw Data

Materialise the first asset by fetching data from an external source (API, database, file system). The @materialize decorator registers this function as the producer of the raw data asset.

Key considerations:

The function return value becomes the asset's materialised data
Prefect tracks when this asset was last materialised for caching decisions
In production, this connects to real data sources (APIs, databases)

Step 3: Process Data

Materialise the processed data asset by transforming the raw data. By accepting the raw data as a function parameter, Prefect automatically infers the dependency and ensures the raw data asset is materialised first.

Key considerations:

Dependency tracking is automatic via function parameter inspection
Processed data can be persisted to storage (S3, local files) within the function
The step is skipped if the upstream asset has not changed (caching)

Step 4: Generate Analytics

Materialise the analytics asset from the processed data, demonstrating chained dependencies. Create rich markdown artifacts in the Prefect UI to provide summary statistics and observability dashboards.

Key considerations:

Chained dependencies (analytics depends on processed, which depends on raw) are resolved automatically
Artifact creation provides rich context in the Prefect UI (tables, markdown, links)
The full dependency chain is visible in the asset lineage graph

Step 5: Orchestrate Pipeline

The flow function calls each materialisation function in sequence, passing results between them. Prefect tracks the entire dependency graph, enables caching across runs, and provides a complete execution timeline.

Key considerations:

The flow coordinates asset materialisation without manual ordering logic
Failed materialisations can be retried independently
The pipeline can be scheduled for recurring execution via deployments

Execution Diagram

GitHub URL

Workflow Repository