Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hpcaitech ColossalAI CEvalDataset

From Leeroopedia


Knowledge Sources
Domains Evaluation, Benchmarking
Last Updated 2026-02-09 00:00 GMT

Overview

CEvalDataset is a dataset wrapper class that loads and converts the C-Eval Chinese examination benchmark into the ColossalEval inference format, covering 52 subjects across STEM, Social Science, Humanities, and Other domains.

Description

The class extends BaseDataset and provides a static load method that reads CSV files organized into "dev" and "test" splits. Each file corresponds to a subject mapped through the ceval_subject_mapping dictionary, which translates English subject keys to Chinese names and assigns domain categories. The loader formats questions as Chinese single-choice prompts with four options (A-D) and supports few-shot evaluation by prepending dev-split examples to test-split inference.

Usage

Use this class when you need to evaluate a language model on the C-Eval benchmark within the ColossalEval framework. It expects the C-Eval dataset organized with "dev" and "test" subdirectories containing per-subject CSV files.

Code Reference

Source Location

Signature

class CEvalDataset(BaseDataset):
    @staticmethod
    def load(path: str, logger: DistributedLogger, few_shot: bool, *args, **kwargs) -> List[Dict]:

Import

from colossal_eval.dataset.ceval import CEvalDataset

I/O Contract

Inputs

Name Type Required Description
path str Yes Path to the directory containing "dev" and "test" subdirectories with per-subject CSV files
logger DistributedLogger Yes Logger instance for distributed logging
few_shot bool Yes Whether to prepend dev-split examples as few-shot demonstrations for the test split

Outputs

Name Type Description
dataset Dict[str, Dict] A nested dictionary with "dev" and "test" splits, each containing subject categories with "data" (list of data samples with fields dataset, split, category, instruction, input, output, target, id) and "inference_kwargs" (calculate_loss=False, all_classes=["A","B","C","D"], language="Chinese", max_new_tokens=32)

Usage Examples

from colossal_eval.dataset.ceval import CEvalDataset
from colossalai.logging import DistributedLogger

logger = DistributedLogger("ceval")
dataset = CEvalDataset(path="/path/to/ceval-exam", logger=logger, few_shot=True)
dataset.save("/path/to/output.json")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment