Implementation:Openai Evals Oaievalset Run

Knowledge Sources	OpenAI Evals
Domains	Evaluation, Orchestration
Last Updated	2026-02-14 10:00 GMT

Overview

Concrete tool for orchestrating batch evaluation execution provided by the oaievalset CLI module.

Description

The oaievalset.run function resolves an eval set, builds a list of oaieval commands, loads progress from a prior run (if resuming), filters out completed commands, and sequentially executes each remaining command via subprocess.run. It supports pass-through of unknown CLI arguments to the underlying oaieval commands.

Usage

Use this when running a batch of evaluations against a single model. Invoked by the oaievalset CLI entry point.

Code Reference

Source Location

Repository: openai/evals
File: evals/cli/oaievalset.py (lines 81-131)

Signature

def run(
    args: OaiEvalSetArguments,
    unknown_args: list[str],
    registry: Optional[Registry] = None,
    run_command: str = "oaieval",
) -> None:
    """
    Run all evals in an eval set sequentially.

    Args:
        args: Parsed CLI arguments (model, eval_set, resume, exit_on_error).
        unknown_args: Additional arguments passed through to each oaieval invocation.
        registry: Optional Registry instance.
        run_command: CLI command name (default "oaieval").
    """

Import

from evals.cli.oaievalset import run, OaiEvalSetArguments

I/O Contract

Inputs

Name	Type	Required	Description
args.model	str	Yes	Completion function / model name
args.eval_set	str	Yes	Eval set name from registry
args.resume	bool	No	Resume from progress file (default True)
args.exit_on_error	bool	No	Stop on first failure (default True)
unknown_args	list[str]	No	Pass-through arguments to each oaieval command

Outputs

Name	Type	Description
subprocess outputs	—	Each oaieval produces its own JSONL log file
progress file	JSON file	At /tmp/oaievalset/{model}.{eval_set}.progress.txt

Usage Examples

CLI Usage

# Run all evals in a set
oaievalset gpt-3.5-turbo test-basic

# With pass-through arguments
oaievalset gpt-4 test-basic --max_samples 100

# Resume an interrupted run
oaievalset gpt-3.5-turbo test-basic --resume

# Stop on first error (default)
oaievalset gpt-3.5-turbo test-basic --exit-on-error

Related Pages

Implements Principle

Principle:Openai_Evals_Batch_Eval_Execution

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment