Heuristic:Ray project Ray Graceful Shutdown Timing

Knowledge Sources	Ray Serve Deployment Config
Domains	Serve, Reliability, Operations
Last Updated	2026-02-13 16:35 GMT

Overview

Coordinated timeout strategy for Ray Serve lifecycle events: 2-second shutdown poll loop, 20-second forceful kill timeout, 10-second health check interval, and 30-second health check timeout.

Description

Ray Serve uses a set of coordinated timeouts to manage replica lifecycle gracefully. The shutdown sequence is a two-phase process: first, a replica polls every 2 seconds checking if all in-flight requests have completed (graceful phase); if requests are still pending after 20 seconds, the controller forcefully kills the replica (forceful phase). Health checking runs every 10 seconds with a 30-second timeout before marking a replica unhealthy. These defaults balance between fast failure detection and tolerance for temporary slowdowns. The runtime environment setup has a separate 600-second (10 minute) timeout.

Usage

Apply this heuristic when:

Configuring Serve deployments that handle long-running requests (increase `gracefulShutdownTimeoutS`)
Experiencing premature replica kills during redeployment (increase `gracefulShutdownTimeoutS`)
Seeing false-positive unhealthy replica alerts (increase `healthCheckTimeoutS`)
Deploying models with slow initialization (increase runtime env `setupTimeoutSeconds`)

The Insight (Rule of Thumb)

Action: Tune the four coordinated timeouts based on your request and initialization profile:
- `gracefulShutdownWaitLoopS` = 2.0s (poll frequency during shutdown)
- `gracefulShutdownTimeoutS` = 20.0s (max time before forceful kill)
- `healthCheckPeriodS` = 10.0s (check frequency)
- `healthCheckTimeoutS` = 30.0s (max wait for health response)
Value: The defaults work for sub-second request latency. For long-running requests (e.g., LLM inference), increase `gracefulShutdownTimeoutS` to at least 2x your p99 request latency.
Trade-off: Longer timeouts delay failure detection and deployment updates. Shorter timeouts risk dropping in-flight requests or false unhealthy markings.
Runtime environment: `setupTimeoutSeconds` = 600 (10 minutes) for environment provisioning. Set to -1 to disable timeout entirely.

Reasoning

The timeout hierarchy follows a defensive layering principle:

Shutdown poll loop (2s): Frequent polling ensures rapid detection when all requests drain. This is the happy path for graceful shutdown.
Shutdown timeout (20s): The 20-second cap prevents indefinite hangs from stuck requests. For most web services with sub-second latency, 20 seconds is generous. For ML inference (5-60s per request), this should be increased.
Health check period (10s): Frequent enough to detect failures within ~20 seconds (missed check + timeout), infrequent enough to avoid significant overhead.
Health check timeout (30s): 3x the check period, allowing for temporary GC pauses or load spikes without false positives.
Runtime env setup (600s): Model downloads and environment provisioning can be slow. 10 minutes accommodates large model downloads without being infinite.

The proxy startup timeout (60s) is separate and controls how long `Serve.start()` waits for the HTTP proxy to become ready.

Code Evidence

Default timeout values from `Constants.java:45-55`:

public static final Double DEFAULT_GRACEFUL_SHUTDOWN_TIMEOUT_S = 20.0;
public static final Double DEFAULT_GRACEFUL_SHUTDOWN_WAIT_LOOP_S = 2.0;
public static final Double DEFAULT_HEALTH_CHECK_PERIOD_S = 10.0;
public static final Double DEFAULT_HEALTH_CHECK_TIMEOUT_S = 30.0;

Proxy startup timeout from `Constants.java:43`:

/** Max time to wait for proxy in Serve.start. Unit: second */
public static final int PROXY_TIMEOUT_S = 60;

Deployment config using these defaults from `DeploymentConfig.java:37-53`:

private Double gracefulShutdownWaitLoopS = Constants.DEFAULT_GRACEFUL_SHUTDOWN_WAIT_LOOP_S;
private Double gracefulShutdownTimeoutS = Constants.DEFAULT_GRACEFUL_SHUTDOWN_TIMEOUT_S;
private Double healthCheckPeriodS = Constants.DEFAULT_HEALTH_CHECK_PERIOD_S;
private Double healthCheckTimeoutS = Constants.DEFAULT_HEALTH_CHECK_TIMEOUT_S;

Runtime env setup timeout from `RuntimeEnvConfig.java:8`:

private int setupTimeoutSeconds = 600;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment