Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:PrefectHQ Prefect Asset Based Data Pipeline

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Asset_Management
Last Updated 2026-02-09 22:00 GMT

Overview

End-to-end process for building declarative, dependency-aware data pipelines using Prefect Assets with the @materialize decorator for automatic dependency tracking, caching, versioning, and lineage observability.

Description

This workflow demonstrates the asset-based approach to building data pipelines with Prefect. Instead of manually managing data dependencies and execution order, assets are defined with unique keys and materialisation functions decorated with @materialize. Dependencies between assets are automatically inferred from function parameters. Prefect handles execution ordering, caching (skip re-computation when upstream data has not changed), versioning, and full lineage tracking. Rich UI artifacts can be created for observability.

Key outputs:

  • Materialised data assets at each pipeline stage (raw, processed, analytics)
  • Automatic lineage graph showing asset dependencies
  • Rich markdown artifacts in the Prefect UI for observability

Scope:

  • From raw data fetching through processing to analytics generation
  • Declarative dependency management without manual DAG definition

Usage

Execute this workflow when you want to build a data pipeline with automatic dependency resolution, caching, and lineage tracking. It is suitable for multi-stage data processing pipelines where assets represent intermediate and final data products, and where you want Prefect to manage execution order and skip unnecessary recomputation.

Execution Steps

Step 1: Define Asset Keys

Declare the assets that the pipeline will produce, each with a unique key (e.g., S3 path, pipeline URI, or database identifier). These keys serve as the identity for caching, versioning, and lineage tracking.

Key considerations:

  • Keys should be descriptive and follow a consistent naming scheme
  • In production, keys often map to storage locations (S3 paths, database tables)
  • Assets are first-class objects that can be referenced across flows

Step 2: Fetch Raw Data

Materialise the first asset by fetching data from an external source (API, database, file system). The @materialize decorator registers this function as the producer of the raw data asset.

Key considerations:

  • The function return value becomes the asset's materialised data
  • Prefect tracks when this asset was last materialised for caching decisions
  • In production, this connects to real data sources (APIs, databases)

Step 3: Process Data

Materialise the processed data asset by transforming the raw data. By accepting the raw data as a function parameter, Prefect automatically infers the dependency and ensures the raw data asset is materialised first.

Key considerations:

  • Dependency tracking is automatic via function parameter inspection
  • Processed data can be persisted to storage (S3, local files) within the function
  • The step is skipped if the upstream asset has not changed (caching)

Step 4: Generate Analytics

Materialise the analytics asset from the processed data, demonstrating chained dependencies. Create rich markdown artifacts in the Prefect UI to provide summary statistics and observability dashboards.

Key considerations:

  • Chained dependencies (analytics depends on processed, which depends on raw) are resolved automatically
  • Artifact creation provides rich context in the Prefect UI (tables, markdown, links)
  • The full dependency chain is visible in the asset lineage graph

Step 5: Orchestrate Pipeline

The flow function calls each materialisation function in sequence, passing results between them. Prefect tracks the entire dependency graph, enables caching across runs, and provides a complete execution timeline.

Key considerations:

  • The flow coordinates asset materialisation without manual ordering logic
  • Failed materialisations can be retried independently
  • The pipeline can be scheduled for recurring execution via deployments

Execution Diagram

GitHub URL

Workflow Repository