Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Allenai Open instruct Ray Node Setup

From Leeroopedia


Type External Tool Doc (Shell Script)
Source configs/beaker_configs/ray_node_setup.sh:L1-49
Dependencies ray, docker, beaker
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for initializing Ray head and worker nodes in a Beaker-managed multi-node cluster, provided by the Open Instruct repository.

Description

ray_node_setup.sh is a shell script that runs on every replica in a Beaker job. It determines whether the current replica is the head node (rank 0) or a worker node based on the BEAKER_REPLICA_RANK environment variable. On the head node, it starts the Ray head process. On worker nodes, it starts a Ray worker process that connects to the head, then enters a monitoring loop that polls the head's availability every 5 seconds. When the head becomes unreachable (training has completed), workers exit with code 0 to signal clean completion to the Beaker job manager.

The script also sets critical environment variables:

  • NCCL_CUMEM_ENABLE=0 for vLLM performance (avoids NCCL cumulative memory issues).
  • PYTHONPATH to the repository root.
  • Creates the Triton autotune cache directory to silence warnings.

Usage

This script is sourced or executed at the beginning of every Beaker-based GRPO training job. It must run before any Python training code that creates Ray actors.

Code Reference

Source Location

  • Repository: Open Instruct
  • File: configs/beaker_configs/ray_node_setup.sh

Signature

#!/bin/bash
# No function signature; this is an entry-point script.
# Key environment inputs:
#   BEAKER_REPLICA_RANK       - 0 for head, >0 for workers
#   BEAKER_LEADER_REPLICA_HOSTNAME - hostname of the head node
#   REPO_PATH                 - path to the repository root

Import

# Sourced at the top of a Beaker launch script:
source configs/beaker_configs/ray_node_setup.sh

I/O Contract

Inputs

Name Type Description
BEAKER_REPLICA_RANK Environment variable (int) Rank of the current replica; 0 = head node
BEAKER_LEADER_REPLICA_HOSTNAME Environment variable (string) Hostname of the leader replica, resolved to IP via getent hosts
REPO_PATH Environment variable (string) Path to the repository for PYTHONPATH

Outputs

Name Type Description
Running Ray process Side effect A Ray head or worker process is started on the current machine
Exit code 0 Process exit Workers exit cleanly when head becomes unreachable

Usage Examples

# In a Beaker experiment YAML, this script is the entry point:
# On replica 0 (head):
export BEAKER_REPLICA_RANK=0
export BEAKER_LEADER_REPLICA_HOSTNAME="node-0.internal"
export REPO_PATH="/workspace/open-instruct"
source configs/beaker_configs/ray_node_setup.sh
# Head node is now running; proceed to launch training

# On replica 1 (worker):
export BEAKER_REPLICA_RANK=1
export BEAKER_LEADER_REPLICA_HOSTNAME="node-0.internal"
export REPO_PATH="/workspace/open-instruct"
source configs/beaker_configs/ray_node_setup.sh
# Worker joins the cluster and monitors the head

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment