Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba ROLL DPO Zh Demo Dataset

From Leeroopedia


Knowledge Sources
Domains NLP, Preference_Data, DPO
Last Updated 2026-02-07 20:00 GMT

Overview

A Chinese-language DPO training demo dataset containing conversation/chosen/rejected preference pairs in ShareGPT format for preference alignment training.

Description

dpo_zh_demo.json is a 5,058-line JSON array of preference-annotated Chinese conversation data designed for Direct Preference Optimization (DPO) training. Each entry uses the ShareGPT conversation format with three fields:

  • conversations -- A list of conversation turns, each with a "from" field (typically "human") and a "value" field containing the user's Chinese prompt or question.
  • chosen -- An object with "from": "gpt" and a "value" field containing the preferred, higher-quality assistant response.
  • rejected -- An object with "from": "gpt" and a "value" field containing the less preferred, lower-quality assistant response.

The dataset covers diverse Chinese-language topics including data analysis (e.g., Congressional retweet analysis with Python code), financial analysis (e.g., Hong Kong banking sector), logical reasoning, and general knowledge. Chosen responses tend to be detailed, well-structured, and accurate, while rejected responses are typically shorter, less precise, or use incorrect methodology.

This file is registered in the dataset_info.json registry under the key "dpo_zh_demo" with ranking: true and formatting: "sharegpt".

Usage

Use this dataset as a demo or starting point for:

  • DPO preference alignment training on Chinese-language LLMs
  • Testing DPO data loading pipelines in the LLaMA-Factory / mcore_adapter framework
  • Validating ShareGPT-format preference data parsing

Code Reference

Source Location

  • Repository: Alibaba_ROLL
  • File: mcore_adapter/examples/data/dpo_zh_demo.json

Data Schema / Signature

[
  {
    "conversations": [
      {
        "from": "human",
        "value": "string  -- The user's Chinese prompt or question"
      }
    ],
    "chosen": {
      "from": "gpt",
      "value": "string  -- The preferred high-quality response"
    },
    "rejected": {
      "from": "gpt",
      "value": "string  -- The less preferred lower-quality response"
    }
  }
]

I/O Contract

Inputs

Field Type Required Description
conversations array of objects Yes List of conversation turns; each turn has "from" (role) and "value" (content)
conversations[].from string Yes Speaker role, typically "human" for user turns
conversations[].value string Yes The Chinese text content of the conversation turn

Outputs

Field Type Description
chosen object The preferred response with "from" ("gpt") and "value" (high-quality Chinese response)
rejected object The rejected response with "from" ("gpt") and "value" (lower-quality Chinese response)

Usage Examples

import json

# Load the Chinese DPO demo dataset
with open("mcore_adapter/examples/data/dpo_zh_demo.json", "r", encoding="utf-8") as f:
    dpo_data = json.load(f)

print(f"Total preference pairs: {len(dpo_data)}")

# Inspect one entry
entry = dpo_data[0]
user_prompt = entry["conversations"][0]["value"]
chosen_response = entry["chosen"]["value"]
rejected_response = entry["rejected"]["value"]

print(f"User prompt: {user_prompt[:80]}...")
print(f"Chosen length: {len(chosen_response)} chars")
print(f"Rejected length: {len(rejected_response)} chars")

# Use with LLaMA-Factory dataset_info.json registry
# In your training config YAML, reference the dataset by name:
# dataset: dpo_zh_demo
# The data loader will look up "dpo_zh_demo" in dataset_info.json and apply
# the configured formatting ("sharegpt") and column mappings automatically.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment