Implementation:Openai Evals Eval Base Class
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Software_Architecture |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
Concrete abstract base class for defining custom evaluations provided by the evals framework.
Description
The Eval class is the primary abstract base class that all evaluation implementations must extend. It provides the infrastructure for parallel sample evaluation, data loading, recorder integration, and completion function management. Two abstract methods must be implemented: eval_sample (per-sample logic) and run (overall orchestration). The class uses ThreadPool for parallel execution with configurable thread count via the EVALS_THREADS environment variable (default 10). The companion SolverEval class extends this for stateful solver-based evaluations.
Usage
Subclass Eval when creating any custom evaluation. Override eval_sample and run methods. Use self.completion_fn to access the model and recorder helper functions to log results.
Code Reference
Source Location
- Repository: openai/evals
- File: evals/eval.py (lines 46-166)
Signature
class Eval(abc.ABC):
def __init__(
self,
completion_fns: list[Union[CompletionFn, Solver]],
eval_registry_path: Path,
seed: int = 20220722,
name: str = "no_name_eval.default",
registry: Optional[Registry] = None,
samples_jsonl: Optional[str] = None,
):
"""
Args:
completion_fns: List of CompletionFn or Solver instances to evaluate.
eval_registry_path: Path to the registry directory for data lookup.
seed: Random seed for deterministic shuffling (default: 20220722).
name: Eval name in format "base_eval.split" (e.g. "my-eval.dev").
registry: Optional Registry instance for spec lookups.
samples_jsonl: Optional path to default JSONL dataset.
"""
@abc.abstractmethod
def eval_sample(self, sample: Any, rng: random.Random):
"""Evaluate a single sample. Must be implemented by subclasses."""
@abc.abstractmethod
def run(self, recorder: RecorderBase) -> Dict[str, float]:
"""Run the evaluation. Must be implemented by subclasses."""
def eval_all_samples(
self,
recorder: RecorderBase,
samples,
show_progress=True,
record_raw_sample=True,
**_kwargs: Any,
):
"""Evaluate all samples in parallel using ThreadPool."""
def get_samples(self) -> list[dict]:
"""Load samples from self.samples_jsonl."""
@property
def completion_fn(self) -> CompletionFn:
"""Helper for ergonomic access to a single CompletionFn."""
Import
from evals.eval import Eval, SolverEval
from evals.record import RecorderBase
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| completion_fns | list[CompletionFn] | Yes | Model(s) to evaluate |
| eval_registry_path | Path | Yes | Registry path for resolving data file paths |
| seed | int | No | Random seed (default 20220722) |
| name | str | No | Eval name in "base.split" format |
| samples_jsonl | str | No | Path to JSONL dataset |
Outputs
| Name | Type | Description |
|---|---|---|
| run() returns | Dict[str, float] | Aggregated metrics (e.g. {"accuracy": 0.85, "bootstrap_std": 0.02}) |
| Recorded events | via RecorderBase | Match, sampling, and metric events logged during execution |
Usage Examples
Minimal Custom Eval
import evals
import evals.metrics
from evals.eval import Eval
from evals.record import RecorderBase
class ArithmeticEval(Eval):
def __init__(self, completion_fns, samples_jsonl, *args, **kwargs):
super().__init__(completion_fns, *args, **kwargs)
self.samples_jsonl = samples_jsonl
def eval_sample(self, sample, rng):
prompt = sample["input"]
result = self.completion_fn(prompt=prompt, temperature=0.0)
sampled = result.get_completions()[0]
evals.record_and_check_match(
prompt=prompt,
sampled=sampled,
expected=sample["ideal"],
)
def run(self, recorder: RecorderBase):
samples = self.get_samples()
self.eval_all_samples(recorder, samples)
events = recorder.get_events("match")
return {
"accuracy": evals.metrics.get_accuracy(events),
}
Registration for Custom Eval
# In evals/registry/evals/my_eval.yaml
arithmetic:
id: arithmetic.dev.v0
metrics: [accuracy]
arithmetic.dev.v0:
class: my_evals.arithmetic.ArithmeticEval
args:
samples_jsonl: my_data/arithmetic.jsonl