Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:OpenRLHF OpenRLHF vLLM Embedding Resize Warning

From Leeroopedia




Knowledge Sources
Domains Debugging, LLMs, Inference
Last Updated 2026-02-07 10:00 GMT

Overview

Never resize token embeddings when vLLM is enabled; doing so causes vocab size mismatch between trainer and vLLM engine.

Description

When OpenRLHF uses vLLM for generation in PPO/online RL workflows, the tokenizer and model embeddings must remain synchronized between the training model and the vLLM inference engine. If `resize_token_embeddings` is called on the training model (for example, to add a pad token), the embedding matrix size changes but vLLM's copy of the model retains the original size. This causes a vocab size mismatch that leads to cryptic errors during weight synchronization or generation.

Usage

Apply this heuristic whenever working with PPO Training or any workflow that uses vLLM for generation. Instead of resizing embeddings, set `pad_token` to an existing token (typically `eos_token`). This is automatically handled by the `get_tokenizer` function in OpenRLHF.

The Insight (Rule of Thumb)

  • Action: Do NOT call `model.resize_token_embeddings()` when vLLM is enabled. Instead, reuse existing tokens for padding.
  • Value: Set `tokenizer.pad_token = tokenizer.eos_token` and `tokenizer.pad_token_id = tokenizer.eos_token_id`.
  • Trade-off: None. Using eos_token as pad_token is safe because padding is masked during loss computation.

Reasoning

The vLLM engine loads model weights independently. When the training loop broadcasts updated weights to vLLM, the weight shapes must match exactly. Resizing embeddings changes the shape of `model.embed_tokens.weight` and `lm_head.weight`, breaking the weight sync protocol. This was documented in the llama-recipes repository (PR #196) as a common pitfall.

Code evidence from `openrlhf/utils/utils.py:43-49`:

# NOTE: When enable vLLM, do not resize_token_embeddings, or the vocab size will mismatch with vLLM.
# https://github.com/facebookresearch/llama-recipes/pull/196
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id
    if model is not None:
        model.config.pad_token_id = tokenizer.pad_token_id

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment