Implementation:Alibaba ROLL DPO Zh Demo Dataset

Knowledge Sources	Alibaba_ROLL
Domains	NLP, Preference_Data, DPO
Last Updated	2026-02-07 20:00 GMT

Overview

A Chinese-language DPO training demo dataset containing conversation/chosen/rejected preference pairs in ShareGPT format for preference alignment training.

Description

dpo_zh_demo.json is a 5,058-line JSON array of preference-annotated Chinese conversation data designed for Direct Preference Optimization (DPO) training. Each entry uses the ShareGPT conversation format with three fields:

conversations -- A list of conversation turns, each with a "from" field (typically "human") and a "value" field containing the user's Chinese prompt or question.
chosen -- An object with "from": "gpt" and a "value" field containing the preferred, higher-quality assistant response.
rejected -- An object with "from": "gpt" and a "value" field containing the less preferred, lower-quality assistant response.

The dataset covers diverse Chinese-language topics including data analysis (e.g., Congressional retweet analysis with Python code), financial analysis (e.g., Hong Kong banking sector), logical reasoning, and general knowledge. Chosen responses tend to be detailed, well-structured, and accurate, while rejected responses are typically shorter, less precise, or use incorrect methodology.

This file is registered in the dataset_info.json registry under the key "dpo_zh_demo" with ranking: true and formatting: "sharegpt".

Usage

Use this dataset as a demo or starting point for:

DPO preference alignment training on Chinese-language LLMs
Testing DPO data loading pipelines in the LLaMA-Factory / mcore_adapter framework
Validating ShareGPT-format preference data parsing

Code Reference

Source Location

Repository: Alibaba_ROLL
File: mcore_adapter/examples/data/dpo_zh_demo.json

Data Schema / Signature

[
  {
    "conversations": [
      {
        "from": "human",
        "value": "string  -- The user's Chinese prompt or question"
      }
    ],
    "chosen": {
      "from": "gpt",
      "value": "string  -- The preferred high-quality response"
    },
    "rejected": {
      "from": "gpt",
      "value": "string  -- The less preferred lower-quality response"
    }
  }
]

I/O Contract

Inputs

Field	Type	Required	Description
conversations	array of objects	Yes	List of conversation turns; each turn has "from" (role) and "value" (content)
conversations[].from	string	Yes	Speaker role, typically "human" for user turns
conversations[].value	string	Yes	The Chinese text content of the conversation turn

Outputs

Field	Type	Description
chosen	object	The preferred response with "from" ("gpt") and "value" (high-quality Chinese response)
rejected	object	The rejected response with "from" ("gpt") and "value" (lower-quality Chinese response)

Usage Examples

import json

# Load the Chinese DPO demo dataset
with open("mcore_adapter/examples/data/dpo_zh_demo.json", "r", encoding="utf-8") as f:
    dpo_data = json.load(f)

print(f"Total preference pairs: {len(dpo_data)}")

# Inspect one entry
entry = dpo_data[0]
user_prompt = entry["conversations"][0]["value"]
chosen_response = entry["chosen"]["value"]
rejected_response = entry["rejected"]["value"]

print(f"User prompt: {user_prompt[:80]}...")
print(f"Chosen length: {len(chosen_response)} chars")
print(f"Rejected length: {len(rejected_response)} chars")

# Use with LLaMA-Factory dataset_info.json registry
# In your training config YAML, reference the dataset by name:
# dataset: dpo_zh_demo
# The data loader will look up "dpo_zh_demo" in dataset_info.json and apply
# the configured formatting ("sharegpt") and column mappings automatically.

Related Pages

Environment:Alibaba_ROLL_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment