Implementation:OpenGVLab InternVL Correctness Build Data
| Knowledge Sources | |
|---|---|
| Domains | Alignment, Data_Engineering |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for constructing correctness-based preference data via shell script pipelines provided by the InternVL MPO data construction framework.
Description
The correctness_build_data.sh script orchestrates the preference data construction pipeline. It runs mmpr_data_pipeline_correctness_postprocess.py across multiple tile configurations to generate MMPR-format preference pairs from model-generated reasoning samples.
A companion visualprm_build_data.sh constructs VisualPRM data using Monte Carlo step-level reward estimation.
Usage
Run these scripts before MPO training to generate the preference dataset. They require pre-generated model sampling outputs.
Code Reference
Source Location
- Repository: InternVL
- File: internvl_chat/shell/internvl3.0/mpo_data_construction/correctness_build_data.sh
- Lines: L1-34
Signature
# correctness_build_data.sh
# Iterates over tile configurations and runs postprocessing
for MAX_NUM in 1 6 12 18 24; do
python tools/reasoning_data_pipeline/mmpr_data_pipeline_correctness_postprocess.py \
--num-pairs-per-key 15 \
--max-lines 1200000 \
--answer-fix \
--max-num $MAX_NUM \
--input-dir $INPUT_DIR \
--output-dir $OUTPUT_DIR
done
# visualprm_build_data.sh
for MAX_NUM in 1 6 12 18 24; do
python tools/reasoning_data_pipeline/visualprm_data_pipeline_postprocess.py \
--mc-threshold 0.0 \
--max-num $MAX_NUM \
--input-dir $INPUT_DIR \
--output-dir $OUTPUT_DIR
done
Import
cd internvl_chat
bash shell/internvl3.0/mpo_data_construction/correctness_build_data.sh
bash shell/internvl3.0/visualprm_data_construction/visualprm_build_data.sh
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| Model sampling outputs | Directory | Yes | Pre-generated reasoning samples from the model |
| --num-pairs-per-key | int | No | Max preference pairs per question (default 15) |
| --max-lines | int | No | Per-tile-config sample limit (default 1200000) |
| --mc-threshold | float | No | Monte Carlo threshold for VisualPRM (default 0.0) |
Outputs
| Name | Type | Description |
|---|---|---|
| MMPR JSONL | Files | Preference data with question, chosen, rejected fields and associated images |
Usage Examples
Build Correctness Preference Data
cd internvl_chat
# Step 1: Generate model samples (prerequisite)
bash shell/internvl3.0/mpo_data_construction/correctness_mmpr_8b.sh
# Step 2: Build preference pairs from samples
bash shell/internvl3.0/mpo_data_construction/correctness_build_data.sh
# Step 3: Build VisualPRM data
bash shell/internvl3.0/visualprm_data_construction/visualprm_build_data.sh