Principle:OpenGVLab InternVL Prefix LM Conversion
| Principle Name | Prefix_LM_Conversion |
|---|---|
| Domains | Language Modeling, Attention Masking, Model Surgery |
| Last Updated | 2026-02-07 14:00 GMT |
Summary
Prefix LM Conversion is the technique of transforming a standard causal (autoregressive) language model into a prefix language model by modifying its attention masking mechanism. In a prefix LM, a designated prefix portion of the input sequence uses bidirectional attention (every token attends to every other prefix token), while the remainder of the sequence (the target) retains causal (left-to-right) masking. This enables the model to fully encode the prompt context bidirectionally before generating outputs autoregressively.
Motivation
Standard causal LMs process every token with a left-to-right mask, meaning early tokens cannot attend to later tokens even within the input prompt. This limits the model's ability to fully understand the context of the prompt. Prefix LMs address this by allowing bidirectional attention within the prompt (prefix) while preserving the autoregressive property needed for generation. This is particularly useful in multimodal settings where visual tokens prepended to text should be fully contextualized.
Structure
The conversion is performed through monkey-patching the model's forward and generate methods:
- A new bidirectional_mask input tensor (shape [batch_size, seq_length]) is introduced, where 1 indicates prefix tokens and 0 indicates target tokens.
- During forward, the causal attention mask is modified to allow bidirectional attention where the bidirectional_mask is active, then restored after the forward pass.
- During generate, all attention masks are set to fully bidirectional for the prompt encoding phase, leveraging HuggingFace's caching mechanism to maintain proper causal behavior during token generation.
- The original forward and generate methods are preserved as _original_forward and _original_generate for restoration or fallback.
Applicability
This principle applies when:
- A causal LM needs to be used in a prefix-LM mode (e.g., for instruction-following or multimodal tasks)
- The model architecture supports attention bias manipulation (GPT-2, GPT-Neo, GPT-NeoX, GPTJ, BLOOM, OPT)
- Bidirectional context over the input prompt would improve task performance
- The model is used within a framework like MPT that supports prefix_lm configuration
Limitations
- Requires model-specific surgery for each architecture (different attention mask mechanisms)
- Training data must include bidirectional_mask annotations and labels must be masked (set to -100) for prefix positions
- Not all HuggingFace model architectures are supported
- Performance benefit depends on task type; pure generation tasks may not benefit