Implementation:Hpcaitech ColossalAI Code Reward Utils

Knowledge Sources	Hpcaitech_ColossalAI
Domains	Reinforcement Learning, Code Evaluation, Reward Modeling
Last Updated	2026-02-09 00:00 GMT

Overview

Code correctness checking utilities for evaluating generated code solutions in RLHF reward computation.

Description

This module provides functions for checking the correctness of code generated by language models as part of the reward computation pipeline. The check_correctness function runs test cases against generated code using a separate process with a global timeout of 10 minutes to handle edge cases. The check_correctness_code_api function provides an HTTP API-based alternative that delegates correctness checking to a remote service. Both functions are adapted from the PRIME and verl projects and are used in the code reward pipeline of ColossalChat's distributed RLHF system.

Usage

Use these utilities when implementing code-based reward functions for RLHF training where the model generates code and correctness is verified by running test cases against the generated solutions.

Code Reference

Source Location

Repository: Hpcaitech_ColossalAI
File: applications/ColossalChat/coati/distributed/reward/code_reward/utils.py
Lines: 1-71

Signature

def check_correctness(in_outs: Optional[dict], generation, timeout=10, debug=True):

def check_correctness_code_api(
    in_outs: Optional[dict], generation, timeout=10, debug=True,
    url="http://localhost:8000/check_correctness"
):

Import

from coati.distributed.reward.code_reward.utils import check_correctness, check_correctness_code_api

I/O Contract

Inputs (check_correctness)

Name	Type	Required	Description
in_outs	Optional[dict]	Yes	Dictionary containing "inputs" and expected outputs for test cases
generation	str	Yes	The generated code string to test
timeout	int	No	Timeout in seconds for each test case, defaults to 10
debug	bool	No	Whether to print debug information, defaults to True

Outputs

Name	Type	Description
result	list	List of test results; -1 indicates failure for each test case
metadata_list	list	List of metadata dictionaries from test execution

Usage Examples

from coati.distributed.reward.code_reward.utils import check_correctness

# Check if generated code passes test cases
in_outs = {
    "inputs": ["5\n", "10\n"],
    "outputs": ["25\n", "100\n"],
}
generated_code = "n = int(input())\nprint(n * n)"
results, metadata = check_correctness(in_outs, generated_code, timeout=10, debug=True)

# Using the API-based checker
from coati.distributed.reward.code_reward.utils import check_correctness_code_api
results, metadata = check_correctness_code_api(in_outs, generated_code, url="http://localhost:8000/check_correctness")

Related Pages

Environment:Hpcaitech_ColossalAI_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment