Principle:Ray project Ray Serve Graceful Shutdown
| Knowledge Sources | |
|---|---|
| Domains | Model_Serving, Resource_Management |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
A graceful teardown mechanism that stops all deployment replicas, terminates the serving controller, and releases cluster resources.
Description
Serve Graceful Shutdown stops the entire serving infrastructure. It sends a shutdown RPC to the Serve controller, which terminates all replica actors and proxy actors. The client then nullifies its reference to the controller. The shutdown handles edge cases like the controller already being dead (RayActorException) and timeouts (RayTimeoutException).
Usage
Call shutdown when the serving application is no longer needed. All in-flight requests should be completed before shutdown.
Theoretical Basis
Graceful shutdown follows the drain then stop pattern:
- Stop accepting new requests
- Wait for in-flight requests to complete (with timeout)
- Terminate replica actors
- Terminate controller actor
- Release resources