Implementation:Hpcaitech ColossalAI Eval Utilities
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Benchmarking |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
utilities is a utility module providing JSON I/O functions, JSONL parsing, and a distributed rank check helper used throughout the ColossalEval framework.
Description
The module provides five functions: is_rank_0 checks whether the current process is the primary rank (rank 0) or if distributed training is not initialized; jdump serializes Python dictionaries, lists, or strings to JSON files with configurable indentation and encoding; jload deserializes JSON files into Python dictionaries; get_json_list reads JSONL files line by line into a list of parsed JSON objects (with null-line handling); and two internal helpers _make_w_io_base and _make_r_io_base handle file-or-path abstraction for write and read operations respectively, automatically creating parent directories as needed.
Usage
Use these utility functions throughout ColossalEval for reading and writing dataset files, saving evaluation results, and checking distributed rank status. The jdump and jload functions are used by BaseDataset.save and ColossalDataset.load, while get_json_list is used by multiple dataset loaders for JSONL-format data.
Code Reference
Source Location
- Repository: Hpcaitech_ColossalAI
- File: applications/ColossalEval/colossal_eval/utils/utilities.py
- Lines: 1-63
Signature
def is_rank_0() -> bool:
def _make_w_io_base(f, mode: str):
def _make_r_io_base(f, mode: str):
def jdump(obj, f, mode="w", indent=4, default=str):
def jload(f, mode="r"):
def get_json_list(file_path):
Import
from colossal_eval.utils import jdump, jload, get_json_list, is_rank_0
I/O Contract
Inputs (jdump)
| Name | Type | Required | Description |
|---|---|---|---|
| obj | dict, list, or str | Yes | The object to serialize and write to disk |
| f | str or io.IOBase | Yes | File path string or file-like object to write to |
| mode | str | No | File open mode (default "w") |
| indent | int | No | JSON indentation level (default 4) |
| default | callable | No | Function to handle non-serializable entries (default str) |
Outputs (jload)
| Name | Type | Description |
|---|---|---|
| return | dict | The parsed JSON content as a Python dictionary |
Outputs (get_json_list)
| Name | Type | Description |
|---|---|---|
| return | List[dict] | A list of parsed JSON objects, one per line of the JSONL file |
Usage Examples
from colossal_eval.utils import jdump, jload, get_json_list, is_rank_0
# Check if current process is rank 0
if is_rank_0():
print("Running on primary rank")
# Save a dictionary to JSON
data = {"key": "value", "scores": [1, 2, 3]}
jdump(data, "/path/to/output.json")
# Load a JSON file
loaded = jload("/path/to/output.json")
# Read a JSONL file
json_list = get_json_list("/path/to/data.jsonl")