Principle:OpenGVLab InternVL Prefix LM Conversion

Principle Name	Prefix_LM_Conversion
Domains	Language Modeling, Attention Masking, Model Surgery
Last Updated	2026-02-07 14:00 GMT

Summary

Prefix LM Conversion is the technique of transforming a standard causal (autoregressive) language model into a prefix language model by modifying its attention masking mechanism. In a prefix LM, a designated prefix portion of the input sequence uses bidirectional attention (every token attends to every other prefix token), while the remainder of the sequence (the target) retains causal (left-to-right) masking. This enables the model to fully encode the prompt context bidirectionally before generating outputs autoregressively.

Motivation

Standard causal LMs process every token with a left-to-right mask, meaning early tokens cannot attend to later tokens even within the input prompt. This limits the model's ability to fully understand the context of the prompt. Prefix LMs address this by allowing bidirectional attention within the prompt (prefix) while preserving the autoregressive property needed for generation. This is particularly useful in multimodal settings where visual tokens prepended to text should be fully contextualized.

Structure

The conversion is performed through monkey-patching the model's forward and generate methods:

A new bidirectional_mask input tensor (shape [batch_size, seq_length]) is introduced, where 1 indicates prefix tokens and 0 indicates target tokens.
During forward, the causal attention mask is modified to allow bidirectional attention where the bidirectional_mask is active, then restored after the forward pass.
During generate, all attention masks are set to fully bidirectional for the prompt encoding phase, leveraging HuggingFace's caching mechanism to maintain proper causal behavior during token generation.
The original forward and generate methods are preserved as _original_forward and _original_generate for restoration or fallback.

Applicability

This principle applies when:

A causal LM needs to be used in a prefix-LM mode (e.g., for instruction-following or multimodal tasks)
The model architecture supports attention bias manipulation (GPT-2, GPT-Neo, GPT-NeoX, GPTJ, BLOOM, OPT)
Bidirectional context over the input prompt would improve task performance
The model is used within a framework like MPT that supports prefix_lm configuration

Limitations

Requires model-specific surgery for each architecture (different attention mask mechanisms)
Training data must include bidirectional_mask annotations and labels must be masked (set to -100) for prefix positions
Not all HuggingFace model architectures are supported
Performance benefit depends on task type; pure generation tasks may not benefit

Related Pages

Implementation:OpenGVLab_InternVL_HF_PrefixLM_Converter

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment