Implementation:Hpcaitech ColossalAI GaoKaoBenchDataset
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Benchmarking |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
GaoKaoBenchDataset is a dataset wrapper class that loads and converts the GAOKAO-Bench (Chinese National College Entrance Exam) benchmark into the ColossalEval inference format, covering fill-in-the-blank, multiple-choice, and open-ended question types.
Description
The class extends BaseDataset and provides a static load method that reads JSON files from three subdirectories: "Fill-in-the-blank_Questions", "Multiple-choice_Questions", and "Open-ended_Questions". It handles both Chinese and English exam subjects, automatically detecting answer option classes using the get_all_classes helper function which parses option patterns from question text. Multi-choice answers are concatenated into single strings for consistent processing, and the module documents several known data quality issues in the original dataset.
Usage
Use this class when you need to evaluate a language model on the GAOKAO-Bench dataset within the ColossalEval framework. The data directory should follow the GAOKAO-Bench repository structure with a "data" subdirectory containing the three question-type folders.
Code Reference
Source Location
- Repository: Hpcaitech_ColossalAI
- File: applications/ColossalEval/colossal_eval/dataset/gaokaobench.py
- Lines: 1-123
Signature
class GaoKaoBenchDataset(BaseDataset):
@staticmethod
def load(path: str, logger: DistributedLogger, *args, **kwargs) -> List[Dict]:
Import
from colossal_eval.dataset.gaokaobench import GaoKaoBenchDataset
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str | Yes | Path to the GAOKAO-Bench root directory containing a "data" subdirectory with question-type folders |
| logger | DistributedLogger | Yes | Logger instance for distributed logging |
Outputs
| Name | Type | Description |
|---|---|---|
| dataset | Dict[str, Dict] | A nested dictionary with split "test" containing per-subject categories, each with "data" (list of data samples) and "inference_kwargs" (calculate_loss=True, all_classes auto-detected or None, language per subject, max_new_tokens=32) |
Usage Examples
from colossal_eval.dataset.gaokaobench import GaoKaoBenchDataset
from colossalai.logging import DistributedLogger
logger = DistributedLogger("gaokaobench")
dataset = GaoKaoBenchDataset(path="/path/to/GAOKAO-Bench", logger=logger)
dataset.save("/path/to/output.json")