Implementation:Alibaba ROLL Comparison GPT4 Data Zh
| Knowledge Sources | |
|---|---|
| Domains | NLP, Preference_Data, DPO |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
A large-scale Chinese-language DPO preference training dataset containing instruction/chosen/rejected triplets used for GPT-4 quality comparison and preference alignment.
Description
comparison_gpt4_data_zh.json is a JSON array consisting of approximately 218,648 lines of Chinese preference data. Each entry is an object with four fields: instruction (the user prompt or question in Chinese), input (optional additional context, often empty), chosen (the preferred high-quality response, typically at GPT-4 level), and rejected (a lower-quality or incorrect response). The dataset covers a wide range of topics including general knowledge, creative writing, code explanation, scientific reasoning, and everyday questions, all presented in Simplified Chinese.
This file follows the standard Alpaca-style preference data format and is suitable for Direct Preference Optimization (DPO) training, where the model learns to distinguish between high-quality and low-quality responses given the same instruction.
Usage
Use this dataset when performing DPO or RLHF preference alignment training on Chinese-language LLMs. It is particularly suited for:
- Training a reward model that ranks Chinese responses by quality
- Fine-tuning a language model with DPO to prefer GPT-4-quality Chinese outputs
- Evaluating preference data pipelines in multi-domain Chinese NLP tasks
Code Reference
Source Location
- Repository: Alibaba_ROLL
- File:
data/comparison_gpt4_data_zh.json
Data Schema / Signature
[
{
"instruction": "string -- The user prompt or question in Chinese",
"input": "string -- Optional additional context (often empty string)",
"chosen": "string -- The preferred high-quality response",
"rejected": "string -- The lower-quality or incorrect response"
}
]
I/O Contract
Inputs
| Field | Type | Required | Description |
|---|---|---|---|
| instruction | string | Yes | The Chinese-language instruction or question prompt |
| input | string | No | Additional context for the instruction; often an empty string |
Outputs
| Field | Type | Description |
|---|---|---|
| chosen | string | The preferred, high-quality response (GPT-4 level) |
| rejected | string | The less preferred, lower-quality response |
Usage Examples
import json
# Load the Chinese DPO preference dataset
with open("data/comparison_gpt4_data_zh.json", "r", encoding="utf-8") as f:
dataset = json.load(f)
print(f"Total entries: {len(dataset)}")
# Inspect a single entry
entry = dataset[0]
print(f"Instruction: {entry['instruction']}")
print(f"Input: {entry['input']}")
print(f"Chosen: {entry['chosen'][:100]}...")
print(f"Rejected: {entry['rejected'][:100]}...")
# Convert to Hugging Face Dataset format for DPO training
from datasets import Dataset
hf_dataset = Dataset.from_list(dataset)
print(hf_dataset)