Implementation:Alibaba ROLL Comparison GPT4 Data Zh

Knowledge Sources	Alibaba_ROLL
Domains	NLP, Preference_Data, DPO
Last Updated	2026-02-07 20:00 GMT

Overview

A large-scale Chinese-language DPO preference training dataset containing instruction/chosen/rejected triplets used for GPT-4 quality comparison and preference alignment.

Description

comparison_gpt4_data_zh.json is a JSON array consisting of approximately 218,648 lines of Chinese preference data. Each entry is an object with four fields: instruction (the user prompt or question in Chinese), input (optional additional context, often empty), chosen (the preferred high-quality response, typically at GPT-4 level), and rejected (a lower-quality or incorrect response). The dataset covers a wide range of topics including general knowledge, creative writing, code explanation, scientific reasoning, and everyday questions, all presented in Simplified Chinese.

This file follows the standard Alpaca-style preference data format and is suitable for Direct Preference Optimization (DPO) training, where the model learns to distinguish between high-quality and low-quality responses given the same instruction.

Usage

Use this dataset when performing DPO or RLHF preference alignment training on Chinese-language LLMs. It is particularly suited for:

Training a reward model that ranks Chinese responses by quality
Fine-tuning a language model with DPO to prefer GPT-4-quality Chinese outputs
Evaluating preference data pipelines in multi-domain Chinese NLP tasks

Code Reference

Source Location

Repository: Alibaba_ROLL
File: data/comparison_gpt4_data_zh.json

Data Schema / Signature

[
  {
    "instruction": "string  -- The user prompt or question in Chinese",
    "input": "string  -- Optional additional context (often empty string)",
    "chosen": "string  -- The preferred high-quality response",
    "rejected": "string  -- The lower-quality or incorrect response"
  }
]

I/O Contract

Inputs

Field	Type	Required	Description
instruction	string	Yes	The Chinese-language instruction or question prompt
input	string	No	Additional context for the instruction; often an empty string

Outputs

Field	Type	Description
chosen	string	The preferred, high-quality response (GPT-4 level)
rejected	string	The less preferred, lower-quality response

Usage Examples

import json

# Load the Chinese DPO preference dataset
with open("data/comparison_gpt4_data_zh.json", "r", encoding="utf-8") as f:
    dataset = json.load(f)

print(f"Total entries: {len(dataset)}")

# Inspect a single entry
entry = dataset[0]
print(f"Instruction: {entry['instruction']}")
print(f"Input: {entry['input']}")
print(f"Chosen: {entry['chosen'][:100]}...")
print(f"Rejected: {entry['rejected'][:100]}...")

# Convert to Hugging Face Dataset format for DPO training
from datasets import Dataset

hf_dataset = Dataset.from_list(dataset)
print(hf_dataset)

Related Pages

Environment:Alibaba_ROLL_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment