Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Ray project Ray Actor Lifecycle Management

From Leeroopedia
Revision as of 11:05, 16 February 2026 by Admin (talk | contribs) (Auto-imported from workflows/Ray_project_Ray_Actor_Lifecycle_Management.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Distributed_Computing, Actor_Model, Stateful_Computation
Last Updated 2026-02-13 16:00 GMT

Overview

End-to-end process for creating, invoking, and managing stateful Ray actors that maintain persistent state across method calls in a distributed cluster.

Description

This workflow outlines the complete lifecycle of Ray actors, from creation through method invocation to termination. Actors are the primary mechanism for stateful distributed computation in Ray. Each actor is instantiated on a remote worker, maintains internal state across sequential method calls, and is addressable by handle. The workflow covers actor creation via constructor or factory methods, method invocation patterns (synchronous and asynchronous), named actor registration for cross-job discovery, resource-constrained placement, restart policies, concurrency groups, and graceful or forced termination.

Usage

Execute this workflow when you need stateful computation distributed across a cluster. Typical use cases include maintaining counters, accumulators, caches, or model inference servers where internal state must persist between calls. Also applicable when multiple tasks need to coordinate through shared mutable state, or when you need long-running services addressable by name.

Execution Steps

Step 1: Initialize Ray Runtime

Start the Ray runtime to establish the cluster connection. The runtime bootstraps the task submitter, object store, function manager, and worker context required for actor operations. This step is identical to the task execution workflow.

Key considerations:

  • Must be called before any actor operations
  • Configures default actor lifetime (detached vs non-detached) via job config

Step 2: Define Actor Class

Define the class that will serve as the actor. The class constructor initializes internal state, and methods provide the interface for interacting with that state. In Java, actors use standard class constructors referenced via method references. In Python, classes are decorated with @ray.remote.

Key considerations:

  • Actor classes must be serializable
  • Constructor arguments are serialized and sent to the remote worker
  • Methods are executed sequentially on a single thread by default
  • Concurrency groups can be defined to allow parallel execution of specific methods

Step 3: Create Actor Instance

Instantiate the actor on a remote worker by calling the actor creation API with a constructor or factory method reference and arguments. The runtime resolves the function descriptor, serializes arguments, and submits an actor creation task to the scheduler. The scheduler places the actor on a node with sufficient resources (CPU, GPU, custom resources). Returns an ActorHandle for subsequent method calls.

Key considerations:

  • Named actors can be registered with a string name for cross-job discovery
  • Detached actors survive the creating job and persist until explicitly killed
  • Resource requirements (CPU, GPU) constrain placement decisions
  • Max restarts configure automatic recovery on failure
  • Placement groups can co-locate related actors

Step 4: Invoke Actor Methods

Call methods on the actor via its handle. Each method call is submitted as an actor task, serialized with arguments, and queued for execution on the actor's worker. Returns an ObjectRef for the result. Method calls on the same actor are executed sequentially in submission order, maintaining state consistency. Different actors can be called concurrently.

Key considerations:

  • Methods execute sequentially by default to preserve state consistency
  • Overloaded methods require explicit type casting for disambiguation
  • Actor handles can be passed as arguments to other tasks or actors
  • Back-pressure limits pending calls to prevent queue overflow (PendingCallsLimitExceededException)

Step 5: Retrieve Named Actors

Look up previously created named actors by their registered name and optional namespace. This enables cross-job and cross-driver access to long-running actors. Returns an Optional handle that is empty if the actor does not exist.

Key considerations:

  • Named actors must have been created with setName()
  • Namespace isolation prevents name collisions between applications
  • Returns Optional.empty() if actor is not found

Step 6: Terminate Actor

End the actor's lifecycle through graceful exit or forced kill. Graceful exit (exitActor) allows the actor to complete its current task and clean up resources. Forced kill immediately terminates the actor process. After termination, the actor handle becomes invalid and further method calls raise RayActorException.

Key considerations:

  • exitActor() can only be called from within the actor itself
  • kill() can be called externally via the actor handle
  • kill() with noRestart=false allows the actor to restart if max restarts > 0
  • Outstanding ObjectRefs from killed actors resolve to RayActorException

Execution Diagram

GitHub URL

Workflow Repository