Implementation:Hpcaitech ColossalAI GaoKaoBenchDataset

Knowledge Sources	Hpcaitech_ColossalAI
Domains	Evaluation, Benchmarking
Last Updated	2026-02-09 00:00 GMT

Overview

GaoKaoBenchDataset is a dataset wrapper class that loads and converts the GAOKAO-Bench (Chinese National College Entrance Exam) benchmark into the ColossalEval inference format, covering fill-in-the-blank, multiple-choice, and open-ended question types.

Description

The class extends BaseDataset and provides a static load method that reads JSON files from three subdirectories: "Fill-in-the-blank_Questions", "Multiple-choice_Questions", and "Open-ended_Questions". It handles both Chinese and English exam subjects, automatically detecting answer option classes using the get_all_classes helper function which parses option patterns from question text. Multi-choice answers are concatenated into single strings for consistent processing, and the module documents several known data quality issues in the original dataset.

Usage

Use this class when you need to evaluate a language model on the GAOKAO-Bench dataset within the ColossalEval framework. The data directory should follow the GAOKAO-Bench repository structure with a "data" subdirectory containing the three question-type folders.

Code Reference

Source Location

Repository: Hpcaitech_ColossalAI
File: applications/ColossalEval/colossal_eval/dataset/gaokaobench.py
Lines: 1-123

Signature

class GaoKaoBenchDataset(BaseDataset):
    @staticmethod
    def load(path: str, logger: DistributedLogger, *args, **kwargs) -> List[Dict]:

Import

from colossal_eval.dataset.gaokaobench import GaoKaoBenchDataset

I/O Contract

Inputs

Name	Type	Required	Description
path	str	Yes	Path to the GAOKAO-Bench root directory containing a "data" subdirectory with question-type folders
logger	DistributedLogger	Yes	Logger instance for distributed logging

Outputs

Name	Type	Description
dataset	Dict[str, Dict]	A nested dictionary with split "test" containing per-subject categories, each with "data" (list of data samples) and "inference_kwargs" (calculate_loss=True, all_classes auto-detected or None, language per subject, max_new_tokens=32)

Usage Examples

from colossal_eval.dataset.gaokaobench import GaoKaoBenchDataset
from colossalai.logging import DistributedLogger

logger = DistributedLogger("gaokaobench")
dataset = GaoKaoBenchDataset(path="/path/to/GAOKAO-Bench", logger=logger)
dataset.save("/path/to/output.json")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment