Implementation:Hpcaitech ColossalAI Eval Utilities

Knowledge Sources	Hpcaitech_ColossalAI
Domains	Evaluation, Benchmarking
Last Updated	2026-02-09 00:00 GMT

Overview

utilities is a utility module providing JSON I/O functions, JSONL parsing, and a distributed rank check helper used throughout the ColossalEval framework.

Description

The module provides five functions: is_rank_0 checks whether the current process is the primary rank (rank 0) or if distributed training is not initialized; jdump serializes Python dictionaries, lists, or strings to JSON files with configurable indentation and encoding; jload deserializes JSON files into Python dictionaries; get_json_list reads JSONL files line by line into a list of parsed JSON objects (with null-line handling); and two internal helpers _make_w_io_base and _make_r_io_base handle file-or-path abstraction for write and read operations respectively, automatically creating parent directories as needed.

Usage

Use these utility functions throughout ColossalEval for reading and writing dataset files, saving evaluation results, and checking distributed rank status. The jdump and jload functions are used by BaseDataset.save and ColossalDataset.load, while get_json_list is used by multiple dataset loaders for JSONL-format data.

Code Reference

Source Location

Repository: Hpcaitech_ColossalAI
File: applications/ColossalEval/colossal_eval/utils/utilities.py
Lines: 1-63

Signature

def is_rank_0() -> bool:
def _make_w_io_base(f, mode: str):
def _make_r_io_base(f, mode: str):
def jdump(obj, f, mode="w", indent=4, default=str):
def jload(f, mode="r"):
def get_json_list(file_path):

Import

from colossal_eval.utils import jdump, jload, get_json_list, is_rank_0

I/O Contract

Inputs (jdump)

Name	Type	Required	Description
obj	dict, list, or str	Yes	The object to serialize and write to disk
f	str or io.IOBase	Yes	File path string or file-like object to write to
mode	str	No	File open mode (default "w")
indent	int	No	JSON indentation level (default 4)
default	callable	No	Function to handle non-serializable entries (default str)

Outputs (jload)

Name	Type	Description
return	dict	The parsed JSON content as a Python dictionary

Outputs (get_json_list)

Name	Type	Description
return	List[dict]	A list of parsed JSON objects, one per line of the JSONL file

Usage Examples

from colossal_eval.utils import jdump, jload, get_json_list, is_rank_0

# Check if current process is rank 0
if is_rank_0():
    print("Running on primary rank")

# Save a dictionary to JSON
data = {"key": "value", "scores": [1, 2, 3]}
jdump(data, "/path/to/output.json")

# Load a JSON file
loaded = jload("/path/to/output.json")

# Read a JSONL file
json_list = get_json_list("/path/to/data.jsonl")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment