Principle:OpenGVLab InternVL Preference Data Construction

Knowledge Sources	InternVL 2.5 InternVL
Domains	Alignment, Data_Engineering, Vision_Language
Last Updated	2026-02-07 00:00 GMT

Overview

A data construction pipeline that generates preference pairs (chosen/rejected responses) for training multimodal models with preference optimization.

Description

Preference data construction creates training data for DPO/MPO alignment by generating multiple candidate responses and pairing correct responses (chosen) with incorrect ones (rejected). InternVL uses two complementary approaches:

Correctness-based pairs: The model generates multiple reasoning chains for each question. Responses that arrive at the correct answer become chosen; incorrect responses become rejected. This teaches the model to prefer correct reasoning.

VisualPRM (Process Reward Model) pairs: Step-level reward scores are computed using Monte Carlo sampling. Steps that lead to correct outcomes receive higher rewards. This provides fine-grained preference signal at the reasoning step level.

The pipeline processes samples across multiple dynamic resolution configurations (tile counts: 1, 6, 12, 18, 24) to ensure diverse visual inputs.

Usage

Use preference data construction before MPO training to generate the MMPR-format preference dataset. This is a prerequisite for the preference optimization workflow.

Theoretical Basis

# Pseudo-code: Correctness-based preference pair construction
def construct_pairs(model, questions, max_pairs=15):
    for question in questions:
        # Generate N candidate responses
        responses = [model.generate(question) for _ in range(N)]

        # Evaluate correctness
        correct = [r for r in responses if evaluate(r, question.answer)]
        incorrect = [r for r in responses if not evaluate(r, question.answer)]

        # Form preference pairs
        pairs = []
        for chosen in correct:
            for rejected in incorrect:
                pairs.append((chosen, rejected))
                if len(pairs) >= max_pairs:
                    break

        yield question, pairs

The MMPR (Multimodal Mixed Preference) format:

# MMPR JSONL format
{
    "question": "What is shown in this chart?",
    "chosen": "The chart shows quarterly revenue growth...",
    "rejected": "The chart displays employee headcount...",
    "image": "chart_001.png"
}

Related Pages

Implemented By

Implementation:OpenGVLab_InternVL_Correctness_Build_Data

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment