Principle:Openai Evals Eval Set Resolution

Knowledge Sources	OpenAI Evals
Domains	Evaluation, Configuration
Last Updated	2026-02-14 10:00 GMT

Overview

A registry lookup mechanism that resolves an eval set name to an ordered list of individual eval names for batch execution.

Description

Eval Set Resolution maps a set name (such as "test-basic") to an EvalSetSpec dataclass containing a sequence of eval names. Eval sets are defined as YAML files in evals/registry/eval_sets/ and provide a way to group related evaluations for batch execution. The resolved list of eval names is then iterated by oaievalset to run each eval sequentially. The repository includes 18 pre-defined eval sets covering basic tests, model-graded evaluations, and domain-specific benchmarks.

Usage

Use eval set resolution when running a batch of related evaluations. This is the core lookup for the oaievalset CLI's second positional argument.

Theoretical Basis

An eval set is a simple grouping mechanism:

# Eval set YAML structure
test-basic:
  evals:
    - test-match
    - test-fuzzy-match
    - test-includes

The EvalSetSpec dataclass contains:

evals — Ordered sequence of eval name strings
key — Canonical set name
group — YAML filename grouping

Related Pages

Implemented By

Implementation:Openai_Evals_Registry_Get_Eval_Set

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment