Principle:Dagster io Dagster Sensor Driven Pipelines
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Event_Driven |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Strategy for triggering pipeline execution based on external events detected through periodic polling sensors.
Description
Sensors are periodic evaluation functions that poll external systems for changes and trigger pipeline runs in response. They maintain state across evaluations via cursors (e.g., etags, timestamps, sequence numbers) to detect only new events. Sensors can create RunRequests for jobs, add dynamic partitions, and report skip reasons when no action is needed. They integrate with the Dagster daemon for continuous background evaluation.
A sensor function is invoked at regular intervals (controlled by minimum_interval_seconds). On each invocation, it receives a context object containing the cursor from the previous evaluation. The function checks an external system for changes since the cursor, and returns one of:
- SensorResult -- containing run requests, dynamic partition additions, and an updated cursor.
- SkipReason -- when no action is needed, providing a human-readable explanation visible in the Dagster UI.
This model enables event-driven architectures within Dagster without requiring external webhook infrastructure.
Usage
Use when pipeline execution should be driven by external events: new files appearing in cloud storage, RSS feed updates, database row insertions, API state changes, or webhook-like triggers. Sensors provide the event-detection mechanism for event-driven architectures.
Sensors are also the primary mechanism for populating dynamic partition sets. When a sensor discovers a new entity, it simultaneously registers the entity as a partition key and requests a run for that partition.
Theoretical Basis
Sensors implement the polling-based observer pattern. The sensor function is the observer, and the external system is the subject. Rather than the subject pushing notifications (as in webhooks), the observer periodically pulls state and computes the diff.
The cursor mechanism is central to correctness. It provides:
- Exactly-once event detection -- each event is detected in exactly one sensor evaluation, because the cursor advances past processed events.
- Crash recovery -- the cursor is persisted by the Dagster instance. If the daemon restarts, the sensor resumes from its last committed cursor.
- Idempotency -- given the same cursor, the sensor produces the same result (assuming the external system is append-only).
The trade-off compared to push-based systems (webhooks) is latency vs. simplicity:
| Property | Polling (Sensors) | Push (Webhooks) |
|---|---|---|
| Latency | Bounded by polling interval | Near real-time |
| Infrastructure | No external setup needed | Requires webhook endpoint |
| Reliability | Built-in retry via daemon | Requires delivery guarantees |
| State management | Cursor in Dagster storage | External state required |
In pseudocode, the sensor evaluation loop is:
while daemon_is_running:
for sensor in registered_sensors:
if time_since_last_eval(sensor) >= sensor.minimum_interval_seconds:
cursor = load_cursor(sensor)
result = sensor.evaluate(cursor)
if result.has_run_requests:
submit_runs(result.run_requests)
register_partitions(result.dynamic_partitions_requests)
save_cursor(sensor, result.cursor)
sleep(tick_interval)