Implementation:NVIDIA NeMo Aligner Process To Regression Format
| Knowledge Sources | |
|---|---|
| Domains | SteerLM, Reward Modeling, Data Preprocessing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
A script that converts data from the attribute-conditioned SFT training format into the regression reward model training format, producing text-label pairs suitable for training SteerLM regression reward models.
Description
process_to_regression_format.py performs the final conversion step in the SteerLM data preparation pipeline. It transforms attribute-conditioned SFT data into the format required for training regression reward models:
- Input parsing: Reads a JSONL file where each line contains a conversation with labeled assistant turns (attribute:score pairs).
- Text formatting: For each conversation, builds the full text using the NeMo
<extra_id_*>template:<extra_id_0>System\n{system_prompt}\nfor the system message<extra_id_1>User\n{message}\nfor user turns<extra_id_1>Assistant\n{message}\nfor assistant turns
- At each labeled assistant turn, the text up to that point is captured with a
<extra_id_2>label prefix appended.
- Label vector construction: The attribute:score string (e.g.,
helpfulness:3,correctness:4) is parsed, and a numeric label vector is constructed with one float per attribute in the full SteerLM attribute order (quality, toxicity, humor, creativity, helpfulness, correctness, coherence, complexity, verbosity). Attributes not present in the annotation receive a value of-100(sentinel for missing attributes). - Output writing: Each labeled turn produces one output line with
{"text": "...", "label": [...]}.
Usage
Use this script when:
- You have attribute-conditioned SFT data (from HelpSteer preprocessing or attribute annotation) and need to convert it for regression reward model training
- You are setting up the SteerLM regression reward model training pipeline
Code Reference
Source Location
- Repository: NVIDIA_NeMo_Aligner
- File:
examples/nlp/data/steerlm/process_to_regression_format.py - Lines: 1-92
Signature
process_sample:
def process_sample(line, fout):
parse:
def parse(s):
prepare_args:
def prepare_args():
main:
def main(args):
Import
from process_to_regression_format import process_sample, parse
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --input-file | str |
Yes | Path to input JSONL file in attribute-conditioned SFT format |
| --output-file | str |
Yes | Path to output JSONL file in regression format |
Outputs
| Name | Type | Description |
|---|---|---|
| output-file | JSONL file | Regression training data with each line containing a JSON object with text (formatted conversation up to a labeled turn with <extra_id_2> prefix) and label (list of 9 floats, one per SteerLM attribute, with -100 for missing attributes)
|
Each output line has the following structure:
{
"text": "<extra_id_0>System\nA chat between...\n<extra_id_1>User\nHello\n<extra_id_1>Assistant\nHi there!\n<extra_id_2>",
"label": [3.0, 0.0, 1.0, 2.0, 3.0, 4.0, 3.0, 2.0, 1.0]
}
Usage Examples
# Command-line usage:
python process_to_regression_format.py \
--input-file /data/helpsteer_processed/train.jsonl \
--output-file /data/helpsteer_regression/train.jsonl