Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hpcaitech ColossalAI GaoKaoBenchDataset

From Leeroopedia


Knowledge Sources
Domains Evaluation, Benchmarking
Last Updated 2026-02-09 00:00 GMT

Overview

GaoKaoBenchDataset is a dataset wrapper class that loads and converts the GAOKAO-Bench (Chinese National College Entrance Exam) benchmark into the ColossalEval inference format, covering fill-in-the-blank, multiple-choice, and open-ended question types.

Description

The class extends BaseDataset and provides a static load method that reads JSON files from three subdirectories: "Fill-in-the-blank_Questions", "Multiple-choice_Questions", and "Open-ended_Questions". It handles both Chinese and English exam subjects, automatically detecting answer option classes using the get_all_classes helper function which parses option patterns from question text. Multi-choice answers are concatenated into single strings for consistent processing, and the module documents several known data quality issues in the original dataset.

Usage

Use this class when you need to evaluate a language model on the GAOKAO-Bench dataset within the ColossalEval framework. The data directory should follow the GAOKAO-Bench repository structure with a "data" subdirectory containing the three question-type folders.

Code Reference

Source Location

Signature

class GaoKaoBenchDataset(BaseDataset):
    @staticmethod
    def load(path: str, logger: DistributedLogger, *args, **kwargs) -> List[Dict]:

Import

from colossal_eval.dataset.gaokaobench import GaoKaoBenchDataset

I/O Contract

Inputs

Name Type Required Description
path str Yes Path to the GAOKAO-Bench root directory containing a "data" subdirectory with question-type folders
logger DistributedLogger Yes Logger instance for distributed logging

Outputs

Name Type Description
dataset Dict[str, Dict] A nested dictionary with split "test" containing per-subject categories, each with "data" (list of data samples) and "inference_kwargs" (calculate_loss=True, all_classes auto-detected or None, language per subject, max_new_tokens=32)

Usage Examples

from colossal_eval.dataset.gaokaobench import GaoKaoBenchDataset
from colossalai.logging import DistributedLogger

logger = DistributedLogger("gaokaobench")
dataset = GaoKaoBenchDataset(path="/path/to/GAOKAO-Bench", logger=logger)
dataset.save("/path/to/output.json")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment