Implementation:Hpcaitech ColossalAI Code Reward Utils
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement Learning, Code Evaluation, Reward Modeling |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Code correctness checking utilities for evaluating generated code solutions in RLHF reward computation.
Description
This module provides functions for checking the correctness of code generated by language models as part of the reward computation pipeline. The check_correctness function runs test cases against generated code using a separate process with a global timeout of 10 minutes to handle edge cases. The check_correctness_code_api function provides an HTTP API-based alternative that delegates correctness checking to a remote service. Both functions are adapted from the PRIME and verl projects and are used in the code reward pipeline of ColossalChat's distributed RLHF system.
Usage
Use these utilities when implementing code-based reward functions for RLHF training where the model generates code and correctness is verified by running test cases against the generated solutions.
Code Reference
Source Location
- Repository: Hpcaitech_ColossalAI
- File: applications/ColossalChat/coati/distributed/reward/code_reward/utils.py
- Lines: 1-71
Signature
def check_correctness(in_outs: Optional[dict], generation, timeout=10, debug=True):
def check_correctness_code_api(
in_outs: Optional[dict], generation, timeout=10, debug=True,
url="http://localhost:8000/check_correctness"
):
Import
from coati.distributed.reward.code_reward.utils import check_correctness, check_correctness_code_api
I/O Contract
Inputs (check_correctness)
| Name | Type | Required | Description |
|---|---|---|---|
| in_outs | Optional[dict] | Yes | Dictionary containing "inputs" and expected outputs for test cases |
| generation | str | Yes | The generated code string to test |
| timeout | int | No | Timeout in seconds for each test case, defaults to 10 |
| debug | bool | No | Whether to print debug information, defaults to True |
Outputs
| Name | Type | Description |
|---|---|---|
| result | list | List of test results; -1 indicates failure for each test case |
| metadata_list | list | List of metadata dictionaries from test execution |
Usage Examples
from coati.distributed.reward.code_reward.utils import check_correctness
# Check if generated code passes test cases
in_outs = {
"inputs": ["5\n", "10\n"],
"outputs": ["25\n", "100\n"],
}
generated_code = "n = int(input())\nprint(n * n)"
results, metadata = check_correctness(in_outs, generated_code, timeout=10, debug=True)
# Using the API-based checker
from coati.distributed.reward.code_reward.utils import check_correctness_code_api
results, metadata = check_correctness_code_api(in_outs, generated_code, url="http://localhost:8000/check_correctness")