Principle:Langgenius Dify Workflow Execution Monitoring
| Knowledge Sources | |
|---|---|
| Domains | Workflow Observability Execution Monitoring |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Workflow execution monitoring is the real-time observability system that tracks running workflows with per-node status updates delivered through streaming events, enabling developers to observe execution progress, diagnose failures, and review historical run data.
Description
When a workflow executes, the system provides comprehensive observability through two complementary mechanisms: real-time streaming events during execution and queryable run history after completion.
SSE Event Streams: During workflow execution, the backend emits Server-Sent Events (SSE) that report the status of the workflow as a whole and each individual node. These events arrive in real time, allowing the visual builder to animate the canvas -- highlighting nodes as they begin execution, showing progress indicators, and marking nodes as succeeded or failed as results come in. This streaming approach is essential for workflows that may take minutes to complete due to LLM calls, external API requests, or iteration over large datasets.
Workflow Status State Machine: The overall workflow tracks its execution through a well-defined set of states:
- Waiting -- The workflow has been submitted but has not begun processing
- Running -- At least one node is actively executing
- Succeeded -- All nodes completed successfully and the final output has been produced
- Failed -- A node encountered an unrecoverable error
- Stopped -- The execution was manually cancelled by the user
Node Status State Machine: Each node within the workflow tracks its own status independently, with a richer set of states:
- NotStart -- The node has not yet been reached in the execution order
- Waiting -- The node is queued and waiting for its dependencies to complete
- Listening -- The node is waiting for an external event (e.g., a webhook or plugin trigger)
- Running -- The node is actively processing
- Succeeded -- The node completed without error
- Failed -- The node encountered an error
- Exception -- The node encountered an unexpected system-level error
- Retry -- The node failed and is being retried according to its retry policy
- Stopped -- The node was stopped due to workflow cancellation
Execution Tracing: Each node execution produces a NodeTracing record that captures the node's unique execution ID, the node definition ID, final status, wall-clock elapsed time, resolved inputs, produced outputs, and any error information. These traces persist beyond the execution and can be queried from the run history.
Run History: Completed workflow executions are stored and accessible through a run history API. This allows developers to review past executions, compare results across runs, and identify patterns in failures or performance degradation.
Usage
Execution monitoring is used during:
- Live debugging: Watching a workflow execute in real time to observe the flow of data through nodes
- Failure diagnosis: Examining which node failed, what inputs it received, and what error occurred
- Performance analysis: Reviewing elapsed times per node to identify bottlenecks
- Regression detection: Comparing run history across workflow versions to spot behavioral changes
- Operational monitoring: Tracking the health and success rate of production workflows
Theoretical Basis
Workflow Status State Machine
+--------- stop --------+
| |
v |
[Waiting] --> [Running] --> [Succeeded]
|
+--> [Failed]
|
+--> [Stopped]
The workflow begins in Waiting when submitted, transitions to Running when the first node begins execution, and terminates in one of three terminal states.
Node Status State Machine
[NotStart] --> [Waiting] --> [Running] --> [Succeeded]
| |
v +--> [Failed] --> [Retry] --> [Running]
[Listening] | |
| +--> [Exception] +--> [Failed]
v |
[Running] [Stopped]
The node state machine is more complex because nodes can enter a Listening state while waiting for external events and can transition through Retry cycles before ultimately succeeding or failing.
SSE Event Flow
The streaming event protocol delivers events in a predictable sequence:
workflow_started
|
+--> node_started (node A)
| |
| +--> node_finished (node A, status: succeeded)
|
+--> node_started (node B)
| |
| +--> node_finished (node B, status: succeeded)
|
+--> node_started (node C)
| |
| +--> node_finished (node C, status: failed)
|
workflow_finished (status: failed)
For iteration and loop nodes, additional events such as iteration_started, iteration_next, loop_started, and loop_next are emitted to track the progress of each cycle within these compound nodes.
Observability Data Model
WorkflowRun
|-- id: string
|-- status: WorkflowRunningStatus
|-- elapsed_time: number
|-- total_tokens: number
|-- created_at: number
|
+-- NodeTracing[] (one per executed node)
|-- id: string (unique execution ID)
|-- node_id: string (definition ID)
|-- status: NodeRunningStatus
|-- elapsed_time: number
|-- inputs: Record<string, any>
|-- outputs: Record<string, any>
|-- error: string | null
This hierarchical data model allows monitoring tools to present both the high-level workflow status and drill down into individual node executions for detailed inspection.