Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Ray project Ray Graceful Shutdown Timing

From Leeroopedia
Knowledge Sources
Domains Serve, Reliability, Operations
Last Updated 2026-02-13 16:35 GMT

Overview

Coordinated timeout strategy for Ray Serve lifecycle events: 2-second shutdown poll loop, 20-second forceful kill timeout, 10-second health check interval, and 30-second health check timeout.

Description

Ray Serve uses a set of coordinated timeouts to manage replica lifecycle gracefully. The shutdown sequence is a two-phase process: first, a replica polls every 2 seconds checking if all in-flight requests have completed (graceful phase); if requests are still pending after 20 seconds, the controller forcefully kills the replica (forceful phase). Health checking runs every 10 seconds with a 30-second timeout before marking a replica unhealthy. These defaults balance between fast failure detection and tolerance for temporary slowdowns. The runtime environment setup has a separate 600-second (10 minute) timeout.

Usage

Apply this heuristic when:

  • Configuring Serve deployments that handle long-running requests (increase `gracefulShutdownTimeoutS`)
  • Experiencing premature replica kills during redeployment (increase `gracefulShutdownTimeoutS`)
  • Seeing false-positive unhealthy replica alerts (increase `healthCheckTimeoutS`)
  • Deploying models with slow initialization (increase runtime env `setupTimeoutSeconds`)

The Insight (Rule of Thumb)

  • Action: Tune the four coordinated timeouts based on your request and initialization profile:
    • `gracefulShutdownWaitLoopS` = 2.0s (poll frequency during shutdown)
    • `gracefulShutdownTimeoutS` = 20.0s (max time before forceful kill)
    • `healthCheckPeriodS` = 10.0s (check frequency)
    • `healthCheckTimeoutS` = 30.0s (max wait for health response)
  • Value: The defaults work for sub-second request latency. For long-running requests (e.g., LLM inference), increase `gracefulShutdownTimeoutS` to at least 2x your p99 request latency.
  • Trade-off: Longer timeouts delay failure detection and deployment updates. Shorter timeouts risk dropping in-flight requests or false unhealthy markings.
  • Runtime environment: `setupTimeoutSeconds` = 600 (10 minutes) for environment provisioning. Set to -1 to disable timeout entirely.

Reasoning

The timeout hierarchy follows a defensive layering principle:

  1. Shutdown poll loop (2s): Frequent polling ensures rapid detection when all requests drain. This is the happy path for graceful shutdown.
  2. Shutdown timeout (20s): The 20-second cap prevents indefinite hangs from stuck requests. For most web services with sub-second latency, 20 seconds is generous. For ML inference (5-60s per request), this should be increased.
  3. Health check period (10s): Frequent enough to detect failures within ~20 seconds (missed check + timeout), infrequent enough to avoid significant overhead.
  4. Health check timeout (30s): 3x the check period, allowing for temporary GC pauses or load spikes without false positives.
  5. Runtime env setup (600s): Model downloads and environment provisioning can be slow. 10 minutes accommodates large model downloads without being infinite.

The proxy startup timeout (60s) is separate and controls how long `Serve.start()` waits for the HTTP proxy to become ready.

Code Evidence

Default timeout values from `Constants.java:45-55`:

public static final Double DEFAULT_GRACEFUL_SHUTDOWN_TIMEOUT_S = 20.0;
public static final Double DEFAULT_GRACEFUL_SHUTDOWN_WAIT_LOOP_S = 2.0;
public static final Double DEFAULT_HEALTH_CHECK_PERIOD_S = 10.0;
public static final Double DEFAULT_HEALTH_CHECK_TIMEOUT_S = 30.0;

Proxy startup timeout from `Constants.java:43`:

/** Max time to wait for proxy in Serve.start. Unit: second */
public static final int PROXY_TIMEOUT_S = 60;

Deployment config using these defaults from `DeploymentConfig.java:37-53`:

private Double gracefulShutdownWaitLoopS = Constants.DEFAULT_GRACEFUL_SHUTDOWN_WAIT_LOOP_S;
private Double gracefulShutdownTimeoutS = Constants.DEFAULT_GRACEFUL_SHUTDOWN_TIMEOUT_S;
private Double healthCheckPeriodS = Constants.DEFAULT_HEALTH_CHECK_PERIOD_S;
private Double healthCheckTimeoutS = Constants.DEFAULT_HEALTH_CHECK_TIMEOUT_S;

Runtime env setup timeout from `RuntimeEnvConfig.java:8`:

private int setupTimeoutSeconds = 600;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment