Implementation:Hpcaitech ColossalAI AGIEvalDataset
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Benchmarking |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
AGIEvalDataset is a dataset wrapper class that loads and converts AGIEval benchmark data into the ColossalEval inference format, supporting both English and Chinese question-answering and cloze-style tasks.
Description
The class extends BaseDataset and provides a static load method that reads JSONL files from the AGIEval dataset directory. It handles multiple subcategories including English QA datasets (LSAT, SAT, AQUA-RAT), Chinese QA datasets (LogiQA, JEC-QA, Gaokao subjects), and cloze datasets for both languages. The module also includes helper functions get_prompt for formatting individual questions and combine_prompt for constructing few-shot demonstration prompts from CSV files.
Usage
Use this class when you need to evaluate a language model on the AGIEval benchmark within the ColossalEval framework. It is instantiated with a path to the AGIEval data directory and optionally supports few-shot prompting.
Code Reference
Source Location
- Repository: Hpcaitech_ColossalAI
- File: applications/ColossalEval/colossal_eval/dataset/agieval.py
- Lines: 1-267
Signature
class AGIEvalDataset(BaseDataset):
@staticmethod
def load(path: str, logger: DistributedLogger, few_shot: bool, *args, **kwargs) -> List[Dict]:
Import
from colossal_eval.dataset.agieval import AGIEvalDataset
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str | Yes | Path to the directory containing AGIEval JSONL files and optional few_shot_prompts.csv |
| logger | DistributedLogger | Yes | Logger instance for distributed logging |
| few_shot | bool | Yes | Whether to load few-shot demonstration prompts from CSV |
Outputs
| Name | Type | Description |
|---|---|---|
| dataset | Dict[str, Dict] | A nested dictionary with split "test" containing subcategories, each with "data" (list of data samples) and "inference_kwargs" (inference configuration including calculate_loss, all_classes, language, max_new_tokens, and few_shot_data) |
Usage Examples
from colossal_eval.dataset.agieval import AGIEvalDataset
from colossalai.logging import DistributedLogger
logger = DistributedLogger("agieval")
dataset = AGIEvalDataset(path="/path/to/agieval/data", logger=logger, few_shot=True)
dataset.save("/path/to/output.json")