Implementation:OpenRLHF OpenRLHF Create vllm engines

Knowledge Sources	OpenRLHF vLLM Documentation
Domains	Inference, Training_Infrastructure
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for creating vLLM generation engines as Ray actors provided by OpenRLHF.

Description

The create_vllm_engines function spawns one or more vLLM LLMEngine instances as Ray actors, each bound to a specific GPU placement group. The engines are configured with model path, tensor parallel size, GPU memory utilization, and generation parameters. They support weight updates from the training actor for on-policy generation.

Usage

Called during PPO/GRPO initialization after Ray cluster setup. The returned engine references are passed to the PPO trainer for sample generation.

Code Reference

Source Location

Repository: OpenRLHF
File: openrlhf/trainer/ray/vllm_engine.py

Signature

def create_vllm_engines(
    num_engines: int,              # Number of vLLM engine instances
    pretrain: str,                 # Model path or HF ID
    max_len: int,                  # Maximum sequence length
    gpu_memory_utilization: float, # vLLM GPU memory fraction
    tensor_parallel_size: int = 1, # TP size per engine
    seed: int = 42,
    enable_prefix_caching: bool = False,
    **kwargs,
) -> list:
    """Returns list of Ray actor handles for vLLM engines."""

Import

from openrlhf.trainer.ray.vllm_engine import create_vllm_engines

I/O Contract

Inputs

Name	Type	Required	Description
num_engines	int	Yes	Number of vLLM engine instances to create
pretrain	str	Yes	Model checkpoint path
gpu_memory_utilization	float	Yes	Fraction of GPU memory for KV-cache

Outputs

Name	Type	Description
engines	list	Ray actor handles for vLLM engine instances

Usage Examples

from openrlhf.trainer.ray.vllm_engine import create_vllm_engines

vllm_engines = create_vllm_engines(
    num_engines=2,
    pretrain="meta-llama/Llama-2-7b-hf",
    max_len=2048,
    gpu_memory_utilization=0.8,
    tensor_parallel_size=1,
)

Related Pages

Implements Principle

Principle:OpenRLHF_OpenRLHF_vLLM_Inference_Engine

Requires Environment

Uses Heuristic

Heuristic:OpenRLHF_OpenRLHF_vLLM_Embedding_Resize_Warning

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment