Implementation:Alibaba ROLL DPO Zh Demo Dataset
| Knowledge Sources | |
|---|---|
| Domains | NLP, Preference_Data, DPO |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
A Chinese-language DPO training demo dataset containing conversation/chosen/rejected preference pairs in ShareGPT format for preference alignment training.
Description
dpo_zh_demo.json is a 5,058-line JSON array of preference-annotated Chinese conversation data designed for Direct Preference Optimization (DPO) training. Each entry uses the ShareGPT conversation format with three fields:
- conversations -- A list of conversation turns, each with a
"from"field (typically"human") and a"value"field containing the user's Chinese prompt or question. - chosen -- An object with
"from": "gpt"and a"value"field containing the preferred, higher-quality assistant response. - rejected -- An object with
"from": "gpt"and a"value"field containing the less preferred, lower-quality assistant response.
The dataset covers diverse Chinese-language topics including data analysis (e.g., Congressional retweet analysis with Python code), financial analysis (e.g., Hong Kong banking sector), logical reasoning, and general knowledge. Chosen responses tend to be detailed, well-structured, and accurate, while rejected responses are typically shorter, less precise, or use incorrect methodology.
This file is registered in the dataset_info.json registry under the key "dpo_zh_demo" with ranking: true and formatting: "sharegpt".
Usage
Use this dataset as a demo or starting point for:
- DPO preference alignment training on Chinese-language LLMs
- Testing DPO data loading pipelines in the LLaMA-Factory / mcore_adapter framework
- Validating ShareGPT-format preference data parsing
Code Reference
Source Location
- Repository: Alibaba_ROLL
- File:
mcore_adapter/examples/data/dpo_zh_demo.json
Data Schema / Signature
[
{
"conversations": [
{
"from": "human",
"value": "string -- The user's Chinese prompt or question"
}
],
"chosen": {
"from": "gpt",
"value": "string -- The preferred high-quality response"
},
"rejected": {
"from": "gpt",
"value": "string -- The less preferred lower-quality response"
}
}
]
I/O Contract
Inputs
| Field | Type | Required | Description |
|---|---|---|---|
| conversations | array of objects | Yes | List of conversation turns; each turn has "from" (role) and "value" (content) |
| conversations[].from | string | Yes | Speaker role, typically "human" for user turns |
| conversations[].value | string | Yes | The Chinese text content of the conversation turn |
Outputs
| Field | Type | Description |
|---|---|---|
| chosen | object | The preferred response with "from" ("gpt") and "value" (high-quality Chinese response) |
| rejected | object | The rejected response with "from" ("gpt") and "value" (lower-quality Chinese response) |
Usage Examples
import json
# Load the Chinese DPO demo dataset
with open("mcore_adapter/examples/data/dpo_zh_demo.json", "r", encoding="utf-8") as f:
dpo_data = json.load(f)
print(f"Total preference pairs: {len(dpo_data)}")
# Inspect one entry
entry = dpo_data[0]
user_prompt = entry["conversations"][0]["value"]
chosen_response = entry["chosen"]["value"]
rejected_response = entry["rejected"]["value"]
print(f"User prompt: {user_prompt[:80]}...")
print(f"Chosen length: {len(chosen_response)} chars")
print(f"Rejected length: {len(rejected_response)} chars")
# Use with LLaMA-Factory dataset_info.json registry
# In your training config YAML, reference the dataset by name:
# dataset: dpo_zh_demo
# The data loader will look up "dpo_zh_demo" in dataset_info.json and apply
# the configured formatting ("sharegpt") and column mappings automatically.