Principle:Openai Evals Eval Set Resolution
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Configuration |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
A registry lookup mechanism that resolves an eval set name to an ordered list of individual eval names for batch execution.
Description
Eval Set Resolution maps a set name (such as "test-basic") to an EvalSetSpec dataclass containing a sequence of eval names. Eval sets are defined as YAML files in evals/registry/eval_sets/ and provide a way to group related evaluations for batch execution. The resolved list of eval names is then iterated by oaievalset to run each eval sequentially. The repository includes 18 pre-defined eval sets covering basic tests, model-graded evaluations, and domain-specific benchmarks.
Usage
Use eval set resolution when running a batch of related evaluations. This is the core lookup for the oaievalset CLI's second positional argument.
Theoretical Basis
An eval set is a simple grouping mechanism:
# Eval set YAML structure
test-basic:
evals:
- test-match
- test-fuzzy-match
- test-includes
The EvalSetSpec dataclass contains:
- evals — Ordered sequence of eval name strings
- key — Canonical set name
- group — YAML filename grouping