Principle:Nautechsystems Nautilus trader Live Node Lifecycle
| Field | Value |
|---|---|
| sources | https://github.com/nautechsystems/nautilus_trader, https://nautilustrader.io/docs/ |
| domains | algorithmic trading, system lifecycle, asynchronous programming |
| last_updated | 2026-02-10 12:00 GMT |
Overview
The live node lifecycle defines the strict sequence of phases -- build, start, run, stop, dispose -- that a live trading system must traverse, ensuring that every subsystem is initialised before use, that the event loop drives concurrent I/O during steady-state operation, and that shutdown proceeds gracefully without losing in-flight orders or events.
Description
A live trading node is not simply "started"; it passes through a well-defined state machine. Violating the ordering (e.g., running before building, or disposing before stopping) leads to undefined behaviour, lost messages, or resource leaks. The Live Node Lifecycle principle codifies this ordering:
- Build -- Client factories are invoked to instantiate exchange-specific data and execution clients. The node transitions from uninitialised to built. Building twice is an error.
- Run -- The kernel is started asynchronously: connections are established, instruments are loaded, execution state is reconciled, strategies are started. The event loop enters its main loop, draining engine queues (data commands, data requests, data responses, data items, risk commands, risk events, execution commands, execution events). If external message streaming is configured, a streaming task is also spawned.
- Stop -- A graceful shutdown signal is issued. Strategies are stopped, open connections are closed, and residual state is checked. The kernel awaits completion within a configurable timeout.
- Dispose -- Final cleanup: streaming tasks are cancelled, engine queues are drained, the executor is shut down, and the event loop is stopped or closed. Signal handlers registered at startup ensure that SIGINT/SIGTERM trigger this sequence automatically.
The lifecycle supports both synchronous callers (blocking on run()) and asynchronous callers (awaiting run_async()), adapting to the hosting environment.
Usage
Apply this principle whenever:
- Deploying a live trading process that must run continuously and shut down cleanly on operator signal.
- Embedding the trading node inside a larger async application where the event loop is already running.
- Implementing health checks or watchdog processes that need to inspect the node's lifecycle state.
Theoretical Basis
State Machine
The node's lifecycle can be modelled as a finite-state machine:
build() run() / run_async() stop() / signal
CREATED ---------> BUILT ----------------------> RUNNING ---------> STOPPING
|
v
STOPPED
|
dispose()
|
v
DISPOSED
Guards prevent invalid transitions: build() raises if already built; run_async() raises if not yet built.
Async Queue Draining
During the RUN phase, the node creates a set of asyncio tasks -- one per engine queue:
Tasks:
data_engine.cmd_queue_task -- data commands
data_engine.req_queue_task -- data requests
data_engine.res_queue_task -- data responses
data_engine.data_queue_task -- data items
risk_engine.cmd_queue_task -- risk commands
risk_engine.evt_queue_task -- risk events
exec_engine.cmd_queue_task -- execution commands
exec_engine.evt_queue_task -- execution events
These tasks are gathered with asyncio.gather(), meaning the node stays alive as long as any engine queue is actively processing. A sentinel value or task cancellation terminates the loop.
Graceful Shutdown Protocol
When a stop signal is received:
1. Signal handler calls node.stop()
2. Kernel awaits stop_async() -- strategies, actors, and clients are stopped
3. dispose() is called:
a. Wait for kernel to finish stopping (with timeout)
b. Cancel streaming tasks
c. Dispose kernel (release Cython/Rust resources)
d. Shut down thread pool executor
e. Stop and/or close event loop
4. Log final loop state (is_running, is_closed)
The timeout mechanism (timeout_disconnection config parameter) prevents the process from hanging indefinitely if a client fails to disconnect.
Sync vs Async Entry Points
The run() method provides a synchronous wrapper: if the event loop is already running, it creates a task; otherwise it uses loop.run_until_complete(). This dual-mode design accommodates both bare-metal deployments (where the node owns the loop) and embedded deployments (e.g., Jupyter notebooks where the loop is already running).