Implementation:Hiyouga LLaMA Factory DPO Zh Demo Data
| Knowledge Sources | |
|---|---|
| Domains | NLP, Training_Data |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
dpo_zh_demo.json provides Chinese preference data with chosen and rejected response pairs in ShareGPT format for demonstrating and testing Direct Preference Optimization (DPO) and reward modeling workflows in LLaMA Factory.
Description
The file contains a JSON array of Chinese conversation records, each consisting of a conversations list (multi-turn dialogue in ShareGPT format with "from" and "value" fields), a chosen object (the preferred assistant response), and a rejected object (the dispreferred assistant response). The conversations cover topics such as programming, data analysis, logical reasoning, financial analysis, and general knowledge -- all in Simplified Chinese. The chosen and rejected responses represent the human preference signal for DPO training.
This dataset is registered in dataset_info.json with "ranking": true and ShareGPT formatting, with the same column structure as its English counterpart.
Usage
This demo dataset is used for quick testing of DPO training pipelines for Chinese language models. Users reference it by name (dpo_zh_demo) with --stage dpo or --stage rm to validate preference-based training with Chinese text.
Code Reference
Source Location
- Repository: Hiyouga_LLaMA_Factory
- File: data/dpo_zh_demo.json
Data Format
[
{
"conversations": [
{
"from": "human",
"value": "请提供一下对香港银行业的分析以及目前面临的挑战。"
}
],
"chosen": {
"from": "gpt",
"value": "香港银行业长期以来一直是该城市金融服务业和整体经济的重要组成部分..."
},
"rejected": {
"from": "gpt",
"value": "香港的银行业面临着诸多挑战,如低利率、高房价和经济疲软..."
}
}
]
I/O Contract
Schema
| Field | Type | Required | Description |
|---|---|---|---|
| conversations | array | Yes | List of conversation turns in Chinese, each with "from" (human/system/gpt) and "value" (message text)
|
| chosen | object | Yes | The preferred response with "from": "gpt" and "value" containing the chosen Chinese text
|
| rejected | object | Yes | The dispreferred response with "from": "gpt" and "value" containing the rejected Chinese text
|
Conversation Turn Schema
| Field | Type | Required | Description |
|---|---|---|---|
| from | string | Yes | Role identifier: "human", "gpt", or "system"
|
| value | string | Yes | The message content in Chinese |
Dataset Registry Entry
| Property | Value |
|---|---|
| Key | dpo_zh_demo
|
| file_name | dpo_zh_demo.json
|
| formatting | sharegpt |
| ranking | true |
| columns.messages | conversations |
| columns.chosen | chosen |
| columns.rejected | rejected |
| Lines | 5058 |
Usage Examples
# Reference the dataset in a LLaMA Factory training config for DPO
# llamafactory-cli train \
# --dataset dpo_zh_demo \
# --stage dpo \
# --model_name_or_path meta-llama/Llama-2-7b-hf \
# --output_dir output/dpo_zh_demo
# Or for reward model training
# llamafactory-cli train \
# --dataset dpo_zh_demo \
# --stage rm \
# --model_name_or_path meta-llama/Llama-2-7b-hf \
# --output_dir output/rm_zh_demo
# Loading the data manually for inspection
import json
with open("data/dpo_zh_demo.json", "r", encoding="utf-8") as f:
data = json.load(f)
print(f"Number of preference pairs: {len(data)}")
sample = data[0]
print(f"Conversation turns: {len(sample['conversations'])}")
print(f"Chosen response preview: {sample['chosen']['value'][:60]}...")
print(f"Rejected response preview: {sample['rejected']['value'][:60]}...")
Related Pages
- Hiyouga_LLaMA_Factory_DPO_En_Demo_Data - English version of the DPO demo dataset
- Hiyouga_LLaMA_Factory_KTO_En_Demo_Data - Alternative preference format using binary labels
- Hiyouga_LLaMA_Factory_Dataset_Info_Registry - Central dataset registry that indexes this file
- Hiyouga_LLaMA_Factory_Alpaca_Zh_Demo_Data - Chinese SFT demo data (non-preference)