Principle:Openai Evals Result Aggregation
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Data_Analysis |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
A post-processing pattern that extracts and aggregates final results from evaluation log files across multiple runs.
Description
Result Aggregation provides utilities for parsing the JSONL log files produced by oaieval runs. Each log file contains a series of event records (match, sampling, metrics) followed by a final report entry. The aggregation utilities scan a directory of log files, extract the final_report from each, and optionally extract individual sample-level metrics. This enables cross-eval comparison and reporting after batch runs.
Usage
Use result aggregation after completing a batch of evaluations (typically via oaievalset) to collect and compare results across multiple evals or model configurations.
Theoretical Basis
The aggregation follows a map-reduce pattern:
- Map: For each log file in the directory, parse JSONL and extract the final_report entry
- Reduce: Collect all final reports into a dictionary keyed by file path
- Optional: Extract individual sample-level results by filtering for specific event types