Principle:Openai Evals Eval Resolution
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Configuration |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
A registry lookup mechanism that resolves an eval name string to a structured specification describing which eval class and arguments to use.
Description
Eval Resolution converts a user-provided eval name (such as "test-match") into an EvalSpec dataclass containing the fully-qualified class path, constructor arguments, and registry metadata. The registry loads all YAML files from evals/registry/evals/ directories and supports alias dereferencing, where an eval name can point to another eval name. The resolved EvalSpec is then used to instantiate the actual Eval class for execution.
Usage
Use eval resolution whenever a named evaluation needs to be looked up before execution. This is the second positional argument in the oaieval CLI and is also used by oaievalset when iterating over eval sets.
Theoretical Basis
The resolution process follows a dictionary-based lookup with alias support:
# Pseudocode for eval resolution
def resolve_eval(name):
raw_registry = load_yaml_files("evals/registry/evals/")
while raw_registry[name] is alias:
name = raw_registry[name]
spec = raw_registry[name]
return EvalSpec(cls=spec["cls"], args=spec.get("args"), ...)
The EvalSpec dataclass contains:
- cls — Fully qualified class path (e.g. "evals.elsuite.basic.match.Match")
- args — Constructor keyword arguments including dataset path
- registry_path — Path to the registry that defined this eval
- key — The canonical eval name
- group — YAML filename grouping