Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:OpenRLHF OpenRLHF Create vllm engines

From Leeroopedia


Knowledge Sources
Domains Inference, Training_Infrastructure
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for creating vLLM generation engines as Ray actors provided by OpenRLHF.

Description

The create_vllm_engines function spawns one or more vLLM LLMEngine instances as Ray actors, each bound to a specific GPU placement group. The engines are configured with model path, tensor parallel size, GPU memory utilization, and generation parameters. They support weight updates from the training actor for on-policy generation.

Usage

Called during PPO/GRPO initialization after Ray cluster setup. The returned engine references are passed to the PPO trainer for sample generation.

Code Reference

Source Location

  • Repository: OpenRLHF
  • File: openrlhf/trainer/ray/vllm_engine.py

Signature

def create_vllm_engines(
    num_engines: int,              # Number of vLLM engine instances
    pretrain: str,                 # Model path or HF ID
    max_len: int,                  # Maximum sequence length
    gpu_memory_utilization: float, # vLLM GPU memory fraction
    tensor_parallel_size: int = 1, # TP size per engine
    seed: int = 42,
    enable_prefix_caching: bool = False,
    **kwargs,
) -> list:
    """Returns list of Ray actor handles for vLLM engines."""

Import

from openrlhf.trainer.ray.vllm_engine import create_vllm_engines

I/O Contract

Inputs

Name Type Required Description
num_engines int Yes Number of vLLM engine instances to create
pretrain str Yes Model checkpoint path
gpu_memory_utilization float Yes Fraction of GPU memory for KV-cache

Outputs

Name Type Description
engines list Ray actor handles for vLLM engine instances

Usage Examples

from openrlhf.trainer.ray.vllm_engine import create_vllm_engines

vllm_engines = create_vllm_engines(
    num_engines=2,
    pretrain="meta-llama/Llama-2-7b-hf",
    max_len=2048,
    gpu_memory_utilization=0.8,
    tensor_parallel_size=1,
)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment