Principle:OpenGVLab InternVL Preference Data Construction
| Knowledge Sources | |
|---|---|
| Domains | Alignment, Data_Engineering, Vision_Language |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A data construction pipeline that generates preference pairs (chosen/rejected responses) for training multimodal models with preference optimization.
Description
Preference data construction creates training data for DPO/MPO alignment by generating multiple candidate responses and pairing correct responses (chosen) with incorrect ones (rejected). InternVL uses two complementary approaches:
- Correctness-based pairs: The model generates multiple reasoning chains for each question. Responses that arrive at the correct answer become chosen; incorrect responses become rejected. This teaches the model to prefer correct reasoning.
- VisualPRM (Process Reward Model) pairs: Step-level reward scores are computed using Monte Carlo sampling. Steps that lead to correct outcomes receive higher rewards. This provides fine-grained preference signal at the reasoning step level.
The pipeline processes samples across multiple dynamic resolution configurations (tile counts: 1, 6, 12, 18, 24) to ensure diverse visual inputs.
Usage
Use preference data construction before MPO training to generate the MMPR-format preference dataset. This is a prerequisite for the preference optimization workflow.
Theoretical Basis
# Pseudo-code: Correctness-based preference pair construction
def construct_pairs(model, questions, max_pairs=15):
for question in questions:
# Generate N candidate responses
responses = [model.generate(question) for _ in range(N)]
# Evaluate correctness
correct = [r for r in responses if evaluate(r, question.answer)]
incorrect = [r for r in responses if not evaluate(r, question.answer)]
# Form preference pairs
pairs = []
for chosen in correct:
for rejected in incorrect:
pairs.append((chosen, rejected))
if len(pairs) >= max_pairs:
break
yield question, pairs
The MMPR (Multimodal Mixed Preference) format:
# MMPR JSONL format
{
"question": "What is shown in this chart?",
"chosen": "The chart shows quarterly revenue growth...",
"rejected": "The chart displays employee headcount...",
"image": "chart_001.png"
}