Principle:SqueezeAILab ETS Result Collection
| Knowledge Sources | |
|---|---|
| Domains | Data_Management, Experiment_Logging |
| Last Updated | 2026-02-14 02:00 GMT |
Overview
A data serialization pattern that collects tree search results and statistics into structured JSON output files for downstream evaluation.
Description
After the tree search completes for all questions, the results must be serialized into a standardized format that the evaluation pipeline can consume. This involves two outputs:
- answers.json: A JSON array where each element contains the question, all candidate answers with their step-by-step PRM scores, the ground truth answer, and total token count
- stats.log: A text log recording aggregate statistics including total KV cache size, number of model calls, number of tokens generated, and wall-clock time
The answer format is designed to support multiple downstream evaluation strategies (best-of-n, majority voting) by preserving all candidates rather than selecting a single answer.
Usage
This step executes automatically at the end of main() in rebase.py, after all questions have been processed by reward_guided_search.run_batch(). The output file path is specified via --output_path.
Theoretical Basis
Preserving all candidate trajectories (rather than selecting the best one) enables post-hoc evaluation with different strategies:
# Abstract result format
results = [
{
"id": question_index,
"question": problem_text,
"model_answer": [
{"text": full_trajectory, "step_scores": [s1, s2, ...]},
... # all leaf node trajectories
],
"ground_truth_answer": reference,
"total_tokens": token_count,
}
for each question
]
This decouples the search phase from the evaluation phase, allowing researchers to compare different aggregation and voting strategies without re-running the expensive tree search.