Heuristic:OpenRLHF OpenRLHF vLLM Embedding Resize Warning
| Knowledge Sources | |
|---|---|
| Domains | Debugging, LLMs, Inference |
| Last Updated | 2026-02-07 10:00 GMT |
Overview
Never resize token embeddings when vLLM is enabled; doing so causes vocab size mismatch between trainer and vLLM engine.
Description
When OpenRLHF uses vLLM for generation in PPO/online RL workflows, the tokenizer and model embeddings must remain synchronized between the training model and the vLLM inference engine. If `resize_token_embeddings` is called on the training model (for example, to add a pad token), the embedding matrix size changes but vLLM's copy of the model retains the original size. This causes a vocab size mismatch that leads to cryptic errors during weight synchronization or generation.
Usage
Apply this heuristic whenever working with PPO Training or any workflow that uses vLLM for generation. Instead of resizing embeddings, set `pad_token` to an existing token (typically `eos_token`). This is automatically handled by the `get_tokenizer` function in OpenRLHF.
The Insight (Rule of Thumb)
- Action: Do NOT call `model.resize_token_embeddings()` when vLLM is enabled. Instead, reuse existing tokens for padding.
- Value: Set `tokenizer.pad_token = tokenizer.eos_token` and `tokenizer.pad_token_id = tokenizer.eos_token_id`.
- Trade-off: None. Using eos_token as pad_token is safe because padding is masked during loss computation.
Reasoning
The vLLM engine loads model weights independently. When the training loop broadcasts updated weights to vLLM, the weight shapes must match exactly. Resizing embeddings changes the shape of `model.embed_tokens.weight` and `lm_head.weight`, breaking the weight sync protocol. This was documented in the llama-recipes repository (PR #196) as a common pitfall.
Code evidence from `openrlhf/utils/utils.py:43-49`:
# NOTE: When enable vLLM, do not resize_token_embeddings, or the vocab size will mismatch with vLLM.
# https://github.com/facebookresearch/llama-recipes/pull/196
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id
if model is not None:
model.config.pad_token_id = tokenizer.pad_token_id