Implementation:OpenRLHF OpenRLHF Broadcast to vllm

Knowledge Sources	OpenRLHF
Domains	Distributed_Computing, Training_Infrastructure
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for synchronizing policy weights from DeepSpeed training to vLLM inference engines provided by OpenRLHF.

Description

The weight broadcast mechanism gathers the full model state dict from DeepSpeed ZeRO-sharded training workers, then loads it into each vLLM engine's model via Ray remote calls. For LoRA models, only the adapter weights are transferred. The sync happens via NCCL or Ray object store depending on configuration.

Usage

Called by PPOTrainer.fit() after each PPO training epoch, before the next generation round.

Code Reference

Source Location

Repository: OpenRLHF
File: openrlhf/trainer/ray/ppo_actor.py (broadcast method)

Signature

def broadcast_to_vllm(self) -> None:
    """
    Broadcast updated policy weights to vLLM engines.

    Steps:
    1. Gather full state dict from DeepSpeed ZeRO-3
    2. For each vLLM engine:
       - Send updated weights via Ray
       - Engine loads weights into its model
    3. Synchronize to ensure all engines are updated

    Side Effects:
        - vLLM engines' models updated with latest policy weights
    """

Import

# Called internally by PPOTrainer, not directly imported
# Located in: openrlhf/trainer/ray/ppo_actor.py

I/O Contract

Inputs

Name	Type	Required	Description
(self)	ActorPPOTrainer	Yes	Actor trainer with access to model and vLLM refs

Outputs

Name	Type	Description
(side effect)	None	vLLM engines updated with latest policy weights

Usage Examples

# Called within PPO training loop (simplified)
for episode in range(num_episodes):
    # 1. Generate samples with vLLM
    samples = vllm_generate(prompts)

    # 2. Score and compute advantages
    rewards = reward_model(samples)
    advantages = compute_gae(rewards, values)

    # 3. PPO training update
    actor_trainer.ppo_train(experience)

    # 4. Sync weights to vLLM
    actor_trainer.broadcast_to_vllm()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment