Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hpcaitech ColossalAI Eval Utilities

From Leeroopedia
Revision as of 15:08, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Hpcaitech_ColossalAI_Eval_Utilities.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Evaluation, Benchmarking
Last Updated 2026-02-09 00:00 GMT

Overview

utilities is a utility module providing JSON I/O functions, JSONL parsing, and a distributed rank check helper used throughout the ColossalEval framework.

Description

The module provides five functions: is_rank_0 checks whether the current process is the primary rank (rank 0) or if distributed training is not initialized; jdump serializes Python dictionaries, lists, or strings to JSON files with configurable indentation and encoding; jload deserializes JSON files into Python dictionaries; get_json_list reads JSONL files line by line into a list of parsed JSON objects (with null-line handling); and two internal helpers _make_w_io_base and _make_r_io_base handle file-or-path abstraction for write and read operations respectively, automatically creating parent directories as needed.

Usage

Use these utility functions throughout ColossalEval for reading and writing dataset files, saving evaluation results, and checking distributed rank status. The jdump and jload functions are used by BaseDataset.save and ColossalDataset.load, while get_json_list is used by multiple dataset loaders for JSONL-format data.

Code Reference

Source Location

Signature

def is_rank_0() -> bool:
def _make_w_io_base(f, mode: str):
def _make_r_io_base(f, mode: str):
def jdump(obj, f, mode="w", indent=4, default=str):
def jload(f, mode="r"):
def get_json_list(file_path):

Import

from colossal_eval.utils import jdump, jload, get_json_list, is_rank_0

I/O Contract

Inputs (jdump)

Name Type Required Description
obj dict, list, or str Yes The object to serialize and write to disk
f str or io.IOBase Yes File path string or file-like object to write to
mode str No File open mode (default "w")
indent int No JSON indentation level (default 4)
default callable No Function to handle non-serializable entries (default str)

Outputs (jload)

Name Type Description
return dict The parsed JSON content as a Python dictionary

Outputs (get_json_list)

Name Type Description
return List[dict] A list of parsed JSON objects, one per line of the JSONL file

Usage Examples

from colossal_eval.utils import jdump, jload, get_json_list, is_rank_0

# Check if current process is rank 0
if is_rank_0():
    print("Running on primary rank")

# Save a dictionary to JSON
data = {"key": "value", "scores": [1, 2, 3]}
jdump(data, "/path/to/output.json")

# Load a JSON file
loaded = jload("/path/to/output.json")

# Read a JSONL file
json_list = get_json_list("/path/to/data.jsonl")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment