Heuristic:Ray project Ray Graceful Shutdown Timing
| Knowledge Sources | |
|---|---|
| Domains | Serve, Reliability, Operations |
| Last Updated | 2026-02-13 16:35 GMT |
Overview
Coordinated timeout strategy for Ray Serve lifecycle events: 2-second shutdown poll loop, 20-second forceful kill timeout, 10-second health check interval, and 30-second health check timeout.
Description
Ray Serve uses a set of coordinated timeouts to manage replica lifecycle gracefully. The shutdown sequence is a two-phase process: first, a replica polls every 2 seconds checking if all in-flight requests have completed (graceful phase); if requests are still pending after 20 seconds, the controller forcefully kills the replica (forceful phase). Health checking runs every 10 seconds with a 30-second timeout before marking a replica unhealthy. These defaults balance between fast failure detection and tolerance for temporary slowdowns. The runtime environment setup has a separate 600-second (10 minute) timeout.
Usage
Apply this heuristic when:
- Configuring Serve deployments that handle long-running requests (increase `gracefulShutdownTimeoutS`)
- Experiencing premature replica kills during redeployment (increase `gracefulShutdownTimeoutS`)
- Seeing false-positive unhealthy replica alerts (increase `healthCheckTimeoutS`)
- Deploying models with slow initialization (increase runtime env `setupTimeoutSeconds`)
The Insight (Rule of Thumb)
- Action: Tune the four coordinated timeouts based on your request and initialization profile:
- `gracefulShutdownWaitLoopS` = 2.0s (poll frequency during shutdown)
- `gracefulShutdownTimeoutS` = 20.0s (max time before forceful kill)
- `healthCheckPeriodS` = 10.0s (check frequency)
- `healthCheckTimeoutS` = 30.0s (max wait for health response)
- Value: The defaults work for sub-second request latency. For long-running requests (e.g., LLM inference), increase `gracefulShutdownTimeoutS` to at least 2x your p99 request latency.
- Trade-off: Longer timeouts delay failure detection and deployment updates. Shorter timeouts risk dropping in-flight requests or false unhealthy markings.
- Runtime environment: `setupTimeoutSeconds` = 600 (10 minutes) for environment provisioning. Set to -1 to disable timeout entirely.
Reasoning
The timeout hierarchy follows a defensive layering principle:
- Shutdown poll loop (2s): Frequent polling ensures rapid detection when all requests drain. This is the happy path for graceful shutdown.
- Shutdown timeout (20s): The 20-second cap prevents indefinite hangs from stuck requests. For most web services with sub-second latency, 20 seconds is generous. For ML inference (5-60s per request), this should be increased.
- Health check period (10s): Frequent enough to detect failures within ~20 seconds (missed check + timeout), infrequent enough to avoid significant overhead.
- Health check timeout (30s): 3x the check period, allowing for temporary GC pauses or load spikes without false positives.
- Runtime env setup (600s): Model downloads and environment provisioning can be slow. 10 minutes accommodates large model downloads without being infinite.
The proxy startup timeout (60s) is separate and controls how long `Serve.start()` waits for the HTTP proxy to become ready.
Code Evidence
Default timeout values from `Constants.java:45-55`:
public static final Double DEFAULT_GRACEFUL_SHUTDOWN_TIMEOUT_S = 20.0;
public static final Double DEFAULT_GRACEFUL_SHUTDOWN_WAIT_LOOP_S = 2.0;
public static final Double DEFAULT_HEALTH_CHECK_PERIOD_S = 10.0;
public static final Double DEFAULT_HEALTH_CHECK_TIMEOUT_S = 30.0;
Proxy startup timeout from `Constants.java:43`:
/** Max time to wait for proxy in Serve.start. Unit: second */
public static final int PROXY_TIMEOUT_S = 60;
Deployment config using these defaults from `DeploymentConfig.java:37-53`:
private Double gracefulShutdownWaitLoopS = Constants.DEFAULT_GRACEFUL_SHUTDOWN_WAIT_LOOP_S;
private Double gracefulShutdownTimeoutS = Constants.DEFAULT_GRACEFUL_SHUTDOWN_TIMEOUT_S;
private Double healthCheckPeriodS = Constants.DEFAULT_HEALTH_CHECK_PERIOD_S;
private Double healthCheckTimeoutS = Constants.DEFAULT_HEALTH_CHECK_TIMEOUT_S;
Runtime env setup timeout from `RuntimeEnvConfig.java:8`:
private int setupTimeoutSeconds = 600;