Workflow:PrefectHQ Prefect Asset Based Data Pipeline
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Asset_Management |
| Last Updated | 2026-02-09 22:00 GMT |
Overview
End-to-end process for building declarative, dependency-aware data pipelines using Prefect Assets with the @materialize decorator for automatic dependency tracking, caching, versioning, and lineage observability.
Description
This workflow demonstrates the asset-based approach to building data pipelines with Prefect. Instead of manually managing data dependencies and execution order, assets are defined with unique keys and materialisation functions decorated with @materialize. Dependencies between assets are automatically inferred from function parameters. Prefect handles execution ordering, caching (skip re-computation when upstream data has not changed), versioning, and full lineage tracking. Rich UI artifacts can be created for observability.
Key outputs:
- Materialised data assets at each pipeline stage (raw, processed, analytics)
- Automatic lineage graph showing asset dependencies
- Rich markdown artifacts in the Prefect UI for observability
Scope:
- From raw data fetching through processing to analytics generation
- Declarative dependency management without manual DAG definition
Usage
Execute this workflow when you want to build a data pipeline with automatic dependency resolution, caching, and lineage tracking. It is suitable for multi-stage data processing pipelines where assets represent intermediate and final data products, and where you want Prefect to manage execution order and skip unnecessary recomputation.
Execution Steps
Step 1: Define Asset Keys
Declare the assets that the pipeline will produce, each with a unique key (e.g., S3 path, pipeline URI, or database identifier). These keys serve as the identity for caching, versioning, and lineage tracking.
Key considerations:
- Keys should be descriptive and follow a consistent naming scheme
- In production, keys often map to storage locations (S3 paths, database tables)
- Assets are first-class objects that can be referenced across flows
Step 2: Fetch Raw Data
Materialise the first asset by fetching data from an external source (API, database, file system). The @materialize decorator registers this function as the producer of the raw data asset.
Key considerations:
- The function return value becomes the asset's materialised data
- Prefect tracks when this asset was last materialised for caching decisions
- In production, this connects to real data sources (APIs, databases)
Step 3: Process Data
Materialise the processed data asset by transforming the raw data. By accepting the raw data as a function parameter, Prefect automatically infers the dependency and ensures the raw data asset is materialised first.
Key considerations:
- Dependency tracking is automatic via function parameter inspection
- Processed data can be persisted to storage (S3, local files) within the function
- The step is skipped if the upstream asset has not changed (caching)
Step 4: Generate Analytics
Materialise the analytics asset from the processed data, demonstrating chained dependencies. Create rich markdown artifacts in the Prefect UI to provide summary statistics and observability dashboards.
Key considerations:
- Chained dependencies (analytics depends on processed, which depends on raw) are resolved automatically
- Artifact creation provides rich context in the Prefect UI (tables, markdown, links)
- The full dependency chain is visible in the asset lineage graph
Step 5: Orchestrate Pipeline
The flow function calls each materialisation function in sequence, passing results between them. Prefect tracks the entire dependency graph, enables caching across runs, and provides a complete execution timeline.
Key considerations:
- The flow coordinates asset materialisation without manual ordering logic
- Failed materialisations can be retried independently
- The pipeline can be scheduled for recurring execution via deployments