Implementation:Openai Evals Oaievalset Run
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Orchestration |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
Concrete tool for orchestrating batch evaluation execution provided by the oaievalset CLI module.
Description
The oaievalset.run function resolves an eval set, builds a list of oaieval commands, loads progress from a prior run (if resuming), filters out completed commands, and sequentially executes each remaining command via subprocess.run. It supports pass-through of unknown CLI arguments to the underlying oaieval commands.
Usage
Use this when running a batch of evaluations against a single model. Invoked by the oaievalset CLI entry point.
Code Reference
Source Location
- Repository: openai/evals
- File: evals/cli/oaievalset.py (lines 81-131)
Signature
def run(
args: OaiEvalSetArguments,
unknown_args: list[str],
registry: Optional[Registry] = None,
run_command: str = "oaieval",
) -> None:
"""
Run all evals in an eval set sequentially.
Args:
args: Parsed CLI arguments (model, eval_set, resume, exit_on_error).
unknown_args: Additional arguments passed through to each oaieval invocation.
registry: Optional Registry instance.
run_command: CLI command name (default "oaieval").
"""
Import
from evals.cli.oaievalset import run, OaiEvalSetArguments
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| args.model | str | Yes | Completion function / model name |
| args.eval_set | str | Yes | Eval set name from registry |
| args.resume | bool | No | Resume from progress file (default True) |
| args.exit_on_error | bool | No | Stop on first failure (default True) |
| unknown_args | list[str] | No | Pass-through arguments to each oaieval command |
Outputs
| Name | Type | Description |
|---|---|---|
| subprocess outputs | — | Each oaieval produces its own JSONL log file |
| progress file | JSON file | At /tmp/oaievalset/{model}.{eval_set}.progress.txt |
Usage Examples
CLI Usage
# Run all evals in a set
oaievalset gpt-3.5-turbo test-basic
# With pass-through arguments
oaievalset gpt-4 test-basic --max_samples 100
# Resume an interrupted run
oaievalset gpt-3.5-turbo test-basic --resume
# Stop on first error (default)
oaievalset gpt-3.5-turbo test-basic --exit-on-error
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment