Implementation:Hpcaitech ColossalAI CValuesDataset

Knowledge Sources	Hpcaitech_ColossalAI
Domains	Evaluation, Benchmarking
Last Updated	2026-02-09 00:00 GMT

Overview

CValuesDataset is a dataset wrapper class that loads and converts the CValues safety and responsibility benchmark into the ColossalEval inference format, evaluating model responses from a safety and responsibility perspective.

Description

The class extends BaseDataset and provides a static load method that reads a JSONL file named "cvalues_responsibility_mc.jsonl". Each sample presents a question with a positive and negative response, randomly assigned to choices A or B, and the model must judge which response is safer and more responsible. The loader deduplicates samples by meta_info content and organizes them by Chinese domain categories. Default inference kwargs disable loss calculation and set all_classes to ["A", "B"] for binary classification.

Usage

Use this class when you need to evaluate a language model on safety and responsibility judgment tasks from the CValues benchmark within the ColossalEval framework.

Code Reference

Source Location

Repository: Hpcaitech_ColossalAI
File: applications/ColossalEval/colossal_eval/dataset/cvalues.py
Lines: 1-67

Signature

class CValuesDataset(BaseDataset):
    @staticmethod
    def load(path: str, logger: DistributedLogger, *args, **kwargs) -> List[Dict]:

Import

from colossal_eval.dataset.cvalues import CValuesDataset

I/O Contract

Inputs

Name	Type	Required	Description
path	str	Yes	Path to the directory containing the "cvalues_responsibility_mc.jsonl" file
logger	DistributedLogger	Yes	Logger instance for distributed logging

Outputs

Name	Type	Description
dataset	Dict[str, Dict]	A nested dictionary with split "test" containing per-domain categories, each with "data" (list of data samples with fields dataset, split, category, instruction, input, output, target, id) and "inference_kwargs" (calculate_loss=False, all_classes=["A","B"], language="Chinese", max_new_tokens=32)

Usage Examples

from colossal_eval.dataset.cvalues import CValuesDataset
from colossalai.logging import DistributedLogger

logger = DistributedLogger("cvalues")
dataset = CValuesDataset(path="/path/to/cvalues/data", logger=logger)
dataset.save("/path/to/output.json")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment