Principle:Nautechsystems Nautilus trader Live Node Lifecycle

Field	Value
sources	https://github.com/nautechsystems/nautilus_trader, https://nautilustrader.io/docs/
domains	algorithmic trading, system lifecycle, asynchronous programming
last_updated	2026-02-10 12:00 GMT

Overview

The live node lifecycle defines the strict sequence of phases -- build, start, run, stop, dispose -- that a live trading system must traverse, ensuring that every subsystem is initialised before use, that the event loop drives concurrent I/O during steady-state operation, and that shutdown proceeds gracefully without losing in-flight orders or events.

Description

A live trading node is not simply "started"; it passes through a well-defined state machine. Violating the ordering (e.g., running before building, or disposing before stopping) leads to undefined behaviour, lost messages, or resource leaks. The Live Node Lifecycle principle codifies this ordering:

Build -- Client factories are invoked to instantiate exchange-specific data and execution clients. The node transitions from uninitialised to built. Building twice is an error.
Run -- The kernel is started asynchronously: connections are established, instruments are loaded, execution state is reconciled, strategies are started. The event loop enters its main loop, draining engine queues (data commands, data requests, data responses, data items, risk commands, risk events, execution commands, execution events). If external message streaming is configured, a streaming task is also spawned.
Stop -- A graceful shutdown signal is issued. Strategies are stopped, open connections are closed, and residual state is checked. The kernel awaits completion within a configurable timeout.
Dispose -- Final cleanup: streaming tasks are cancelled, engine queues are drained, the executor is shut down, and the event loop is stopped or closed. Signal handlers registered at startup ensure that SIGINT/SIGTERM trigger this sequence automatically.

The lifecycle supports both synchronous callers (blocking on run()) and asynchronous callers (awaiting run_async()), adapting to the hosting environment.

Usage

Apply this principle whenever:

Deploying a live trading process that must run continuously and shut down cleanly on operator signal.
Embedding the trading node inside a larger async application where the event loop is already running.
Implementing health checks or watchdog processes that need to inspect the node's lifecycle state.

Theoretical Basis

State Machine

The node's lifecycle can be modelled as a finite-state machine:

           build()         run() / run_async()       stop() / signal
CREATED ---------> BUILT ----------------------> RUNNING ---------> STOPPING
                                                                       |
                                                                       v
                                                                   STOPPED
                                                                       |
                                                                 dispose()
                                                                       |
                                                                       v
                                                                   DISPOSED

Guards prevent invalid transitions: build() raises if already built; run_async() raises if not yet built.

Async Queue Draining

During the RUN phase, the node creates a set of asyncio tasks -- one per engine queue:

Tasks:
  data_engine.cmd_queue_task    -- data commands
  data_engine.req_queue_task    -- data requests
  data_engine.res_queue_task    -- data responses
  data_engine.data_queue_task   -- data items
  risk_engine.cmd_queue_task    -- risk commands
  risk_engine.evt_queue_task    -- risk events
  exec_engine.cmd_queue_task    -- execution commands
  exec_engine.evt_queue_task    -- execution events

These tasks are gathered with asyncio.gather(), meaning the node stays alive as long as any engine queue is actively processing. A sentinel value or task cancellation terminates the loop.

Graceful Shutdown Protocol

When a stop signal is received:

1. Signal handler calls node.stop()
2. Kernel awaits stop_async() -- strategies, actors, and clients are stopped
3. dispose() is called:
   a. Wait for kernel to finish stopping (with timeout)
   b. Cancel streaming tasks
   c. Dispose kernel (release Cython/Rust resources)
   d. Shut down thread pool executor
   e. Stop and/or close event loop
4. Log final loop state (is_running, is_closed)

The timeout mechanism (timeout_disconnection config parameter) prevents the process from hanging indefinitely if a client fails to disconnect.

Sync vs Async Entry Points

The run() method provides a synchronous wrapper: if the event loop is already running, it creates a task; otherwise it uses loop.run_until_complete(). This dual-mode design accommodates both bare-metal deployments (where the node owns the loop) and embedded deployments (e.g., Jupyter notebooks where the loop is already running).

Related Pages

Implementation:Nautechsystems_Nautilus_trader_TradingNode_Build_Run

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment