Implementation:Allenai Open instruct Ray Node Setup
| Type | External Tool Doc (Shell Script) |
|---|---|
| Source | configs/beaker_configs/ray_node_setup.sh:L1-49
|
| Dependencies | ray, docker, beaker |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for initializing Ray head and worker nodes in a Beaker-managed multi-node cluster, provided by the Open Instruct repository.
Description
ray_node_setup.sh is a shell script that runs on every replica in a Beaker job. It determines whether the current replica is the head node (rank 0) or a worker node based on the BEAKER_REPLICA_RANK environment variable. On the head node, it starts the Ray head process. On worker nodes, it starts a Ray worker process that connects to the head, then enters a monitoring loop that polls the head's availability every 5 seconds. When the head becomes unreachable (training has completed), workers exit with code 0 to signal clean completion to the Beaker job manager.
The script also sets critical environment variables:
NCCL_CUMEM_ENABLE=0for vLLM performance (avoids NCCL cumulative memory issues).PYTHONPATHto the repository root.- Creates the Triton autotune cache directory to silence warnings.
Usage
This script is sourced or executed at the beginning of every Beaker-based GRPO training job. It must run before any Python training code that creates Ray actors.
Code Reference
Source Location
- Repository: Open Instruct
- File:
configs/beaker_configs/ray_node_setup.sh
Signature
#!/bin/bash
# No function signature; this is an entry-point script.
# Key environment inputs:
# BEAKER_REPLICA_RANK - 0 for head, >0 for workers
# BEAKER_LEADER_REPLICA_HOSTNAME - hostname of the head node
# REPO_PATH - path to the repository root
Import
# Sourced at the top of a Beaker launch script:
source configs/beaker_configs/ray_node_setup.sh
I/O Contract
Inputs
| Name | Type | Description |
|---|---|---|
BEAKER_REPLICA_RANK |
Environment variable (int) | Rank of the current replica; 0 = head node |
BEAKER_LEADER_REPLICA_HOSTNAME |
Environment variable (string) | Hostname of the leader replica, resolved to IP via getent hosts
|
REPO_PATH |
Environment variable (string) | Path to the repository for PYTHONPATH
|
Outputs
| Name | Type | Description |
|---|---|---|
| Running Ray process | Side effect | A Ray head or worker process is started on the current machine |
| Exit code 0 | Process exit | Workers exit cleanly when head becomes unreachable |
Usage Examples
# In a Beaker experiment YAML, this script is the entry point:
# On replica 0 (head):
export BEAKER_REPLICA_RANK=0
export BEAKER_LEADER_REPLICA_HOSTNAME="node-0.internal"
export REPO_PATH="/workspace/open-instruct"
source configs/beaker_configs/ray_node_setup.sh
# Head node is now running; proceed to launch training
# On replica 1 (worker):
export BEAKER_REPLICA_RANK=1
export BEAKER_LEADER_REPLICA_HOSTNAME="node-0.internal"
export REPO_PATH="/workspace/open-instruct"
source configs/beaker_configs/ray_node_setup.sh
# Worker joins the cluster and monitors the head