Implementation:Volcengine Verl HFModelConfig
| Knowledge Sources | |
|---|---|
| Domains | Model_Configuration, Training_Infrastructure |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Configuration dataclass that defines model architecture, LoRA, optimization, and loading settings for HuggingFace-based models within the verl training framework.
Description
The HFModelConfig dataclass is the central model configuration for all actors, critics, and reference models in verl. It manages the model path, tokenizer loading, HuggingFace config construction, LoRA adapter settings (rank, alpha, target modules), memory optimization flags (gradient checkpointing, activation offloading, remove padding, fused kernels, tiled MLP), and remote code trust settings. During __post_init__, it automatically resolves local paths, loads the tokenizer and processor, constructs the HuggingFace AutoConfig with override parameters, and validates model architectures.
Usage
This config appears as the model sub-config within actor_rollout_ref, critic, and reward_model sections of the Hydra/OmegaConf configuration. Each worker type (actor, critic, reward model) receives its own HFModelConfig instance to initialize its model.
Code Reference
Source Location
- Repository: verl
- File: verl/workers/config/model.py
- Lines: 72-209
Signature
@dataclass
class HFModelConfig(BaseConfig):
_mutable_fields = {
"hf_config_path", "tokenizer_path", "hf_config",
"generation_config", "tokenizer", "processor",
"local_path", "architectures", "local_hf_config_path",
"local_tokenizer_path",
}
path: str = MISSING
local_path: Optional[str] = None
hf_config_path: Optional[str] = None
local_hf_config_path: Optional[str] = None
tokenizer_path: Optional[str] = None
local_tokenizer_path: Optional[str] = None
load_tokenizer: bool = True
hf_config: Any = None
generation_config: Any = None
tokenizer: Any = None
processor: Any = None
use_shm: bool = False
trust_remote_code: bool = False
custom_chat_template: Optional[str] = None
external_lib: Optional[str] = None
override_config: dict = field(default_factory=dict)
enable_gradient_checkpointing: bool = True
enable_activation_offload: bool = False
use_remove_padding: bool = True
# LoRA configuration
lora_rank: int = 0
lora_alpha: int = 16
target_modules: Optional[str] = "all-linear"
target_parameters: Optional[list[str]] = None
exclude_modules: Optional[str] = None
lora_adapter_path: Optional[str] = None
lora: dict[str, Any] = field(default_factory=dict)
use_liger: bool = False
use_fused_kernels: bool = False
fused_kernel_options: dict = field(default_factory=dict)
tiled_mlp: dict = field(default_factory=lambda: {"enabled": False, "num_shards": 4})
architectures: Optional[list[str]] = None
Import
from verl.workers.config.model import HFModelConfig
I/O Contract
Inputs (Key Configuration Fields)
| Name | Type | Required | Description |
|---|---|---|---|
| path | str | Yes | HuggingFace model ID or local path to model weights |
| lora_rank | int | No | LoRA rank; 0 disables LoRA (default: 0) |
| lora_alpha | int | No | LoRA alpha scaling factor (default: 16) |
| target_modules | Optional[str] | No | LoRA target module specification (default: "all-linear") |
| exclude_modules | Optional[str] | No | Modules to exclude from LoRA adaptation |
| enable_gradient_checkpointing | bool | No | Enable gradient checkpointing for memory savings (default: True) |
| enable_activation_offload | bool | No | Offload activations to CPU during checkpointing (default: False) |
| use_remove_padding | bool | No | Remove padding for efficient computation (default: True) |
| use_fused_kernels | bool | No | Use fused CUDA kernels for optimization (default: False) |
| trust_remote_code | bool | No | Trust remote code when loading models (default: False) |
| override_config | dict | No | Dictionary of model config overrides (e.g., attn_implementation) |
| lora_adapter_path | Optional[str] | No | Path to pre-trained LoRA adapter for continued training |
| custom_chat_template | Optional[str] | No | Custom chat template string for the tokenizer |
Outputs (after __post_init__)
| Name | Type | Description |
|---|---|---|
| tokenizer | Any | Loaded HuggingFace tokenizer instance |
| processor | Any | Loaded HuggingFace processor instance (for multimodal models) |
| hf_config | AutoConfig | Loaded and overridden HuggingFace model configuration |
| generation_config | Any | Generation configuration from the model |
| local_path | str | Resolved local path to model weights |
| architectures | list[str] | Model architecture names extracted from config |
| share_embeddings_and_output_weights | bool | Whether input/output embeddings are tied |
Usage Examples
# Configuration (YAML) - Full fine-tuning
# actor_rollout_ref:
# model:
# path: Qwen/Qwen2.5-7B
# enable_gradient_checkpointing: True
# use_remove_padding: True
# trust_remote_code: False
# override_config:
# attn_implementation: flash_attention_2
# Configuration (YAML) - LoRA fine-tuning
# actor_rollout_ref:
# model:
# path: Qwen/Qwen2.5-32B
# lora_rank: 64
# lora_alpha: 128
# target_modules: all-linear
# enable_gradient_checkpointing: True
# use_fused_kernels: True
# Programmatic usage
from verl.workers.config.model import HFModelConfig
config = HFModelConfig(
path="Qwen/Qwen2.5-7B",
enable_gradient_checkpointing=True,
use_remove_padding=True,
use_fused_kernels=True,
trust_remote_code=False,
override_config={"attn_implementation": "flash_attention_2"},
)
# After __post_init__, the following are available:
print(config.tokenizer) # HuggingFace tokenizer
print(config.hf_config) # AutoConfig with overrides applied
print(config.architectures) # e.g., ["Qwen2ForCausalLM"]
print(config.local_path) # Resolved local path
# LoRA configuration for parameter-efficient training
lora_config = HFModelConfig(
path="Qwen/Qwen2.5-32B",
lora_rank=64,
lora_alpha=128,
target_modules="all-linear",
exclude_modules="lm_head",
)