Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Openai Evals Result Recording

From Leeroopedia
Knowledge Sources
Domains Evaluation, Logging
Last Updated 2026-02-14 10:00 GMT

Overview

An event-based recording system that captures evaluation results, metrics, and metadata across multiple storage backends.

Description

Result Recording provides the infrastructure for persisting evaluation events during and after an eval run. The system uses a RecorderBase abstract class with concrete implementations for local JSON files (LocalRecorder), HTTP endpoints (HttpRecorder), Snowflake databases (Recorder), and no-op testing (DummyRecorder). Events are thread-safe, batched for efficiency, and include typed categories: match results, sampling data, metrics, embeddings, conditional log probabilities, and error reports. The final aggregated report is recorded separately via record_final_report.

Usage

Result recording is used in every evaluation run. The recorder is constructed by build_recorder based on CLI flags (--local-run, --http-run, --dry-run) and passed to Eval.run() which uses it throughout sample evaluation.

Theoretical Basis

The recording system follows an append-only event log pattern:

  1. Events are appended to an in-memory list with thread locking
  2. Periodic flushing writes accumulated events to the configured backend
  3. A context manager (as_default_recorder) associates events with sample IDs
  4. Events are categorized by type: match, sampling, metrics, error, etc.
  5. The final report aggregates per-sample metrics into summary statistics

Event types and their purposes:

  • match — Records whether model output matched expected answer
  • sampling — Records raw model completions with prompts
  • metrics — Records arbitrary key-value metric pairs
  • error — Records exceptions during evaluation
  • raw_sample — Records unprocessed sample data
  • embedding — Records vector embeddings

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment