Implementation:Hiyouga LLaMA Factory DPO Zh Demo Data

Knowledge Sources	Hiyouga_LLaMA_Factory
Domains	NLP, Training_Data
Last Updated	2026-02-06 19:00 GMT

Overview

dpo_zh_demo.json provides Chinese preference data with chosen and rejected response pairs in ShareGPT format for demonstrating and testing Direct Preference Optimization (DPO) and reward modeling workflows in LLaMA Factory.

Description

The file contains a JSON array of Chinese conversation records, each consisting of a conversations list (multi-turn dialogue in ShareGPT format with "from" and "value" fields), a chosen object (the preferred assistant response), and a rejected object (the dispreferred assistant response). The conversations cover topics such as programming, data analysis, logical reasoning, financial analysis, and general knowledge -- all in Simplified Chinese. The chosen and rejected responses represent the human preference signal for DPO training.

This dataset is registered in dataset_info.json with "ranking": true and ShareGPT formatting, with the same column structure as its English counterpart.

Usage

This demo dataset is used for quick testing of DPO training pipelines for Chinese language models. Users reference it by name (dpo_zh_demo) with --stage dpo or --stage rm to validate preference-based training with Chinese text.

Code Reference

Source Location

Repository: Hiyouga_LLaMA_Factory
File: data/dpo_zh_demo.json

Data Format

[
  {
    "conversations": [
      {
        "from": "human",
        "value": "请提供一下对香港银行业的分析以及目前面临的挑战。"
      }
    ],
    "chosen": {
      "from": "gpt",
      "value": "香港银行业长期以来一直是该城市金融服务业和整体经济的重要组成部分..."
    },
    "rejected": {
      "from": "gpt",
      "value": "香港的银行业面临着诸多挑战，如低利率、高房价和经济疲软..."
    }
  }
]

I/O Contract

Schema

Field	Type	Required	Description
conversations	array	Yes	List of conversation turns in Chinese, each with `"from"` (human/system/gpt) and `"value"` (message text)
chosen	object	Yes	The preferred response with `"from": "gpt"` and `"value"` containing the chosen Chinese text
rejected	object	Yes	The dispreferred response with `"from": "gpt"` and `"value"` containing the rejected Chinese text

Conversation Turn Schema

Field	Type	Required	Description
from	string	Yes	Role identifier: `"human"`, `"gpt"`, or `"system"`
value	string	Yes	The message content in Chinese

Dataset Registry Entry

Property	Value
Key	`dpo_zh_demo`
file_name	`dpo_zh_demo.json`
formatting	sharegpt
ranking	true
columns.messages	conversations
columns.chosen	chosen
columns.rejected	rejected
Lines	5058

Usage Examples

# Reference the dataset in a LLaMA Factory training config for DPO
# llamafactory-cli train \
#     --dataset dpo_zh_demo \
#     --stage dpo \
#     --model_name_or_path meta-llama/Llama-2-7b-hf \
#     --output_dir output/dpo_zh_demo

# Or for reward model training
# llamafactory-cli train \
#     --dataset dpo_zh_demo \
#     --stage rm \
#     --model_name_or_path meta-llama/Llama-2-7b-hf \
#     --output_dir output/rm_zh_demo

# Loading the data manually for inspection
import json

with open("data/dpo_zh_demo.json", "r", encoding="utf-8") as f:
    data = json.load(f)

print(f"Number of preference pairs: {len(data)}")
sample = data[0]
print(f"Conversation turns: {len(sample['conversations'])}")
print(f"Chosen response preview: {sample['chosen']['value'][:60]}...")
print(f"Rejected response preview: {sample['rejected']['value'][:60]}...")

Related Pages

Hiyouga_LLaMA_Factory_DPO_En_Demo_Data - English version of the DPO demo dataset
Hiyouga_LLaMA_Factory_KTO_En_Demo_Data - Alternative preference format using binary labels
Hiyouga_LLaMA_Factory_Dataset_Info_Registry - Central dataset registry that indexes this file
Hiyouga_LLaMA_Factory_Alpaca_Zh_Demo_Data - Chinese SFT demo data (non-preference)

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment