Principle:Openai Evals Result Recording

Knowledge Sources	OpenAI Evals
Domains	Evaluation, Logging
Last Updated	2026-02-14 10:00 GMT

Overview

An event-based recording system that captures evaluation results, metrics, and metadata across multiple storage backends.

Description

Result Recording provides the infrastructure for persisting evaluation events during and after an eval run. The system uses a RecorderBase abstract class with concrete implementations for local JSON files (LocalRecorder), HTTP endpoints (HttpRecorder), Snowflake databases (Recorder), and no-op testing (DummyRecorder). Events are thread-safe, batched for efficiency, and include typed categories: match results, sampling data, metrics, embeddings, conditional log probabilities, and error reports. The final aggregated report is recorded separately via record_final_report.

Usage

Result recording is used in every evaluation run. The recorder is constructed by build_recorder based on CLI flags (--local-run, --http-run, --dry-run) and passed to Eval.run() which uses it throughout sample evaluation.

Theoretical Basis

The recording system follows an append-only event log pattern:

Events are appended to an in-memory list with thread locking
Periodic flushing writes accumulated events to the configured backend
A context manager (as_default_recorder) associates events with sample IDs
Events are categorized by type: match, sampling, metrics, error, etc.
The final report aggregates per-sample metrics into summary statistics

Event types and their purposes:

match — Records whether model output matched expected answer
sampling — Records raw model completions with prompts
metrics — Records arbitrary key-value metric pairs
error — Records exceptions during evaluation
raw_sample — Records unprocessed sample data
embedding — Records vector embeddings

Related Pages

Implemented By

Implementation:Openai_Evals_RecorderBase_Record_Final_Report

Uses Heuristic

Heuristic:Openai_Evals_Event_Batching_Configuration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment