Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Openai Evals Oaievalset Run

From Leeroopedia
Knowledge Sources
Domains Evaluation, Orchestration
Last Updated 2026-02-14 10:00 GMT

Overview

Concrete tool for orchestrating batch evaluation execution provided by the oaievalset CLI module.

Description

The oaievalset.run function resolves an eval set, builds a list of oaieval commands, loads progress from a prior run (if resuming), filters out completed commands, and sequentially executes each remaining command via subprocess.run. It supports pass-through of unknown CLI arguments to the underlying oaieval commands.

Usage

Use this when running a batch of evaluations against a single model. Invoked by the oaievalset CLI entry point.

Code Reference

Source Location

  • Repository: openai/evals
  • File: evals/cli/oaievalset.py (lines 81-131)

Signature

def run(
    args: OaiEvalSetArguments,
    unknown_args: list[str],
    registry: Optional[Registry] = None,
    run_command: str = "oaieval",
) -> None:
    """
    Run all evals in an eval set sequentially.

    Args:
        args: Parsed CLI arguments (model, eval_set, resume, exit_on_error).
        unknown_args: Additional arguments passed through to each oaieval invocation.
        registry: Optional Registry instance.
        run_command: CLI command name (default "oaieval").
    """

Import

from evals.cli.oaievalset import run, OaiEvalSetArguments

I/O Contract

Inputs

Name Type Required Description
args.model str Yes Completion function / model name
args.eval_set str Yes Eval set name from registry
args.resume bool No Resume from progress file (default True)
args.exit_on_error bool No Stop on first failure (default True)
unknown_args list[str] No Pass-through arguments to each oaieval command

Outputs

Name Type Description
subprocess outputs Each oaieval produces its own JSONL log file
progress file JSON file At /tmp/oaievalset/{model}.{eval_set}.progress.txt

Usage Examples

CLI Usage

# Run all evals in a set
oaievalset gpt-3.5-turbo test-basic

# With pass-through arguments
oaievalset gpt-4 test-basic --max_samples 100

# Resume an interrupted run
oaievalset gpt-3.5-turbo test-basic --resume

# Stop on first error (default)
oaievalset gpt-3.5-turbo test-basic --exit-on-error

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment