Principle:Dagster io Dagster Dynamic Partitioning
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Event_Driven |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Strategy for creating partition sets at runtime based on data discovery rather than predefined time windows or static lists.
Description
Dynamic partitioning allows the set of partitions to grow at runtime as new data is discovered. Unlike time-based or static partitions which are predetermined, dynamic partitions are created programmatically -- often by sensors -- when new entities appear (new users, new RSS feed entries, new API endpoints). This is essential for event-driven architectures where the universe of data is not known in advance.
A dynamic partition set starts empty. Partition keys are added explicitly through requests, typically issued from within a sensor evaluation. Once a partition key is registered, it becomes available for materialization by any asset that references the dynamic partition definition. Keys can also be removed when they are no longer relevant.
This approach decouples the definition of what can be processed from the definition of how it is processed. The asset logic remains the same regardless of how many partitions exist; only the registry of partition keys changes over time.
Usage
Use when the partition space is not known at definition time and grows as new data arrives. Common scenarios include:
- Event-driven pipelines -- new podcast episodes discovered via RSS, new social media users, new file uploads.
- Entity-based partitioning -- each customer, tenant, or project becomes its own partition.
- API-driven discovery -- a sensor polls an external API and registers new items as partitions.
Dynamic partitioning is not appropriate when the partition space is fully known in advance (use static or time-based partitions instead) or when partition keys change frequently (dynamic partitions are best for append-only registries).
Theoretical Basis
Dynamic partitioning extends the partition model from a closed set to an open set. In the static model, the partition universe P is fixed at definition time:
P = {p1, p2, ..., pN} # fixed at definition time
In the dynamic model, P is a runtime state machine where new keys are registered through explicit add requests:
P(t=0) = {}
P(t=1) = P(t=0) | {new_keys_from_sensor_eval_1}
P(t=2) = P(t=1) | {new_keys_from_sensor_eval_2}
This follows the observer pattern where data discovery (sensors) triggers partition creation and subsequent materialization. The sensor acts as the observer, polling an external system at regular intervals. When it detects new entities, it issues two coordinated actions:
- Register new partition keys -- adds the keys to the dynamic partition set.
- Request runs for those keys -- creates RunRequests that trigger materialization of assets for the new partitions.
The cursor mechanism ensures idempotent discovery: each sensor evaluation picks up only entities that appeared since the last evaluation, preventing duplicate partition registrations.