Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hpcaitech ColossalAI AGIEvalDataset

From Leeroopedia


Knowledge Sources
Domains Evaluation, Benchmarking
Last Updated 2026-02-09 00:00 GMT

Overview

AGIEvalDataset is a dataset wrapper class that loads and converts AGIEval benchmark data into the ColossalEval inference format, supporting both English and Chinese question-answering and cloze-style tasks.

Description

The class extends BaseDataset and provides a static load method that reads JSONL files from the AGIEval dataset directory. It handles multiple subcategories including English QA datasets (LSAT, SAT, AQUA-RAT), Chinese QA datasets (LogiQA, JEC-QA, Gaokao subjects), and cloze datasets for both languages. The module also includes helper functions get_prompt for formatting individual questions and combine_prompt for constructing few-shot demonstration prompts from CSV files.

Usage

Use this class when you need to evaluate a language model on the AGIEval benchmark within the ColossalEval framework. It is instantiated with a path to the AGIEval data directory and optionally supports few-shot prompting.

Code Reference

Source Location

Signature

class AGIEvalDataset(BaseDataset):
    @staticmethod
    def load(path: str, logger: DistributedLogger, few_shot: bool, *args, **kwargs) -> List[Dict]:

Import

from colossal_eval.dataset.agieval import AGIEvalDataset

I/O Contract

Inputs

Name Type Required Description
path str Yes Path to the directory containing AGIEval JSONL files and optional few_shot_prompts.csv
logger DistributedLogger Yes Logger instance for distributed logging
few_shot bool Yes Whether to load few-shot demonstration prompts from CSV

Outputs

Name Type Description
dataset Dict[str, Dict] A nested dictionary with split "test" containing subcategories, each with "data" (list of data samples) and "inference_kwargs" (inference configuration including calculate_loss, all_classes, language, max_new_tokens, and few_shot_data)

Usage Examples

from colossal_eval.dataset.agieval import AGIEvalDataset
from colossalai.logging import DistributedLogger

logger = DistributedLogger("agieval")
dataset = AGIEvalDataset(path="/path/to/agieval/data", logger=logger, few_shot=True)
dataset.save("/path/to/output.json")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment