Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hiyouga LLaMA Factory DPO Zh Demo Data

From Leeroopedia


Knowledge Sources
Domains NLP, Training_Data
Last Updated 2026-02-06 19:00 GMT

Overview

dpo_zh_demo.json provides Chinese preference data with chosen and rejected response pairs in ShareGPT format for demonstrating and testing Direct Preference Optimization (DPO) and reward modeling workflows in LLaMA Factory.

Description

The file contains a JSON array of Chinese conversation records, each consisting of a conversations list (multi-turn dialogue in ShareGPT format with "from" and "value" fields), a chosen object (the preferred assistant response), and a rejected object (the dispreferred assistant response). The conversations cover topics such as programming, data analysis, logical reasoning, financial analysis, and general knowledge -- all in Simplified Chinese. The chosen and rejected responses represent the human preference signal for DPO training.

This dataset is registered in dataset_info.json with "ranking": true and ShareGPT formatting, with the same column structure as its English counterpart.

Usage

This demo dataset is used for quick testing of DPO training pipelines for Chinese language models. Users reference it by name (dpo_zh_demo) with --stage dpo or --stage rm to validate preference-based training with Chinese text.

Code Reference

Source Location

Data Format

[
  {
    "conversations": [
      {
        "from": "human",
        "value": "请提供一下对香港银行业的分析以及目前面临的挑战。"
      }
    ],
    "chosen": {
      "from": "gpt",
      "value": "香港银行业长期以来一直是该城市金融服务业和整体经济的重要组成部分..."
    },
    "rejected": {
      "from": "gpt",
      "value": "香港的银行业面临着诸多挑战,如低利率、高房价和经济疲软..."
    }
  }
]

I/O Contract

Schema

Field Type Required Description
conversations array Yes List of conversation turns in Chinese, each with "from" (human/system/gpt) and "value" (message text)
chosen object Yes The preferred response with "from": "gpt" and "value" containing the chosen Chinese text
rejected object Yes The dispreferred response with "from": "gpt" and "value" containing the rejected Chinese text

Conversation Turn Schema

Field Type Required Description
from string Yes Role identifier: "human", "gpt", or "system"
value string Yes The message content in Chinese

Dataset Registry Entry

Property Value
Key dpo_zh_demo
file_name dpo_zh_demo.json
formatting sharegpt
ranking true
columns.messages conversations
columns.chosen chosen
columns.rejected rejected
Lines 5058

Usage Examples

# Reference the dataset in a LLaMA Factory training config for DPO
# llamafactory-cli train \
#     --dataset dpo_zh_demo \
#     --stage dpo \
#     --model_name_or_path meta-llama/Llama-2-7b-hf \
#     --output_dir output/dpo_zh_demo

# Or for reward model training
# llamafactory-cli train \
#     --dataset dpo_zh_demo \
#     --stage rm \
#     --model_name_or_path meta-llama/Llama-2-7b-hf \
#     --output_dir output/rm_zh_demo

# Loading the data manually for inspection
import json

with open("data/dpo_zh_demo.json", "r", encoding="utf-8") as f:
    data = json.load(f)

print(f"Number of preference pairs: {len(data)}")
sample = data[0]
print(f"Conversation turns: {len(sample['conversations'])}")
print(f"Chosen response preview: {sample['chosen']['value'][:60]}...")
print(f"Rejected response preview: {sample['rejected']['value'][:60]}...")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment