Implementation:OpenRLHF OpenRLHF Actor init
| Knowledge Sources | |
|---|---|
| Domains | NLP, Model_Loading |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for loading causal language models with optional LoRA and quantization provided by OpenRLHF.
Description
The Actor class wraps AutoModelForCausalLM and adds LoRA adapter injection, 4-bit NF4 quantization, MoE model support, Flash Attention 2, and sample packing. It provides a unified forward method that returns action log-probabilities for policy gradient training. The constructor handles all model loading logic including ZeRO-3 compatibility via HfDeepSpeedConfig.
Usage
Import and instantiate when loading a policy model for SFT, DPO, or knowledge distillation training. For PPO workflows, models are loaded inside Ray actors using PolicyModelActor instead.
Code Reference
Source Location
- Repository: OpenRLHF
- File: openrlhf/models/actor.py
- Lines: L16-214 (class), L38-54 (__init__)
Signature
class Actor(nn.Module):
def __init__(
self,
pretrain_or_model, # str or nn.Module: HF model ID or pretrained model
attn_implementation="flash_attention_2", # str: attention implementation
param_dtype="bf16", # str: "bf16" or "fp16"
load_in_4bit=False, # bool: enable NF4 quantization
lora_rank=0, # int: LoRA rank (0 = disabled)
lora_alpha=16, # int: LoRA alpha parameter
lora_dropout=0, # float: LoRA dropout rate
target_modules=None, # list: target modules for LoRA
ds_config=None, # dict: DeepSpeed config for ZeRO-3
device_map=None, # dict: device placement map
packing_samples=False, # bool: enable sample packing
temperature=1.0, # float: action selection temperature
use_liger_kernel=False, # bool: use Liger Kernel optimization
**kwargs,
) -> None:
Import
from openrlhf.models import Actor
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| pretrain_or_model | str or nn.Module | Yes | HuggingFace model ID/path or pretrained model instance |
| lora_rank | int | No | LoRA rank (0 disables LoRA) |
| load_in_4bit | bool | No | Enable 4-bit NF4 quantization |
| ds_config | dict | No | DeepSpeed config (required for ZeRO-3) |
Outputs
| Name | Type | Description |
|---|---|---|
| Actor instance | Actor (nn.Module) | Wrapped causal LM with forward returning log-probabilities |
Usage Examples
Basic SFT Model Loading
from openrlhf.models import Actor
# Load a model for SFT training
model = Actor(
"meta-llama/Llama-2-7b-hf",
attn_implementation="flash_attention_2",
param_dtype="bf16",
)
LoRA + QLoRA Loading
from openrlhf.models import Actor
# Load with LoRA adapters and 4-bit quantization
model = Actor(
"meta-llama/Llama-2-7b-hf",
load_in_4bit=True,
lora_rank=64,
lora_alpha=64,
target_modules="all-linear",
)