Principle:Openai Evals Result Aggregation

Knowledge Sources	OpenAI Evals
Domains	Evaluation, Data_Analysis
Last Updated	2026-02-14 10:00 GMT

Overview

A post-processing pattern that extracts and aggregates final results from evaluation log files across multiple runs.

Description

Result Aggregation provides utilities for parsing the JSONL log files produced by oaieval runs. Each log file contains a series of event records (match, sampling, metrics) followed by a final report entry. The aggregation utilities scan a directory of log files, extract the final_report from each, and optionally extract individual sample-level metrics. This enables cross-eval comparison and reporting after batch runs.

Usage

Use result aggregation after completing a batch of evaluations (typically via oaievalset) to collect and compare results across multiple evals or model configurations.

Theoretical Basis

The aggregation follows a map-reduce pattern:

Map: For each log file in the directory, parse JSONL and extract the final_report entry
Reduce: Collect all final reports into a dictionary keyed by file path
Optional: Extract individual sample-level results by filtering for specific event types

Related Pages

Implemented By

Implementation:Openai_Evals_Get_Final_Results_From_Dir

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment