Implementation:Openai Evals Oaieval Run

Knowledge Sources	OpenAI Evals
Domains	Evaluation, Orchestration
Last Updated	2026-02-14 10:00 GMT

Overview

Concrete tool for orchestrating a complete evaluation run provided by the oaieval CLI module.

Description

The oaieval.run function is the primary orchestration entry point. It resolves the completion function and eval from their names, constructs a RunSpec with a unique run_id, builds the appropriate recorder backend (Local, HTTP, Snowflake, or Dummy), instantiates the eval class, executes it, appends token usage to results, and records the final report. The function returns the unique run_id string.

Usage

Use this function when running a single evaluation programmatically or via the oaieval CLI. This is invoked by oaieval.main() after argument parsing and is also called by oaievalset for each eval in a set.

Code Reference

Source Location

Repository: openai/evals
File: evals/cli/oaieval.py (lines 118-239)

Signature

def run(args: OaiEvalArguments, registry: Optional[Registry] = None) -> str:
    """
    Run a single evaluation.

    Args:
        args: Parsed CLI arguments including completion_fn name, eval name,
              max_samples, seed, record_path, and other configuration.
        registry: Optional Registry instance. If None, creates a new one.

    Returns:
        run_id: Unique identifier string for this evaluation run.
    """

Import

from evals.cli.oaieval import run, OaiEvalArguments

I/O Contract

Inputs

Name	Type	Required	Description
args.completion_fn	str	Yes	Model ID or registered completion function name
args.eval	str	Yes	Eval name from registry
args.max_samples	Optional[int]	No	Limit number of samples evaluated
args.seed	int	No	Random seed (default: 20220722)
args.record_path	Optional[str]	No	Custom path for log output (default: /tmp/evallogs/)
args.extra_eval_params	str	No	Comma-separated key=value overrides for eval args
args.debug	bool	No	Enable debug logging
registry	Optional[Registry]	No	Pre-configured Registry instance

Outputs

Name	Type	Description
run_id	str	Unique run identifier (format: YYMMDDHHMMSSxxxxx)
JSON log file	File	JSONL log at record_path with all events and final report
Console output	Text	Final report metrics printed to console

Usage Examples

CLI Usage

# Basic evaluation
oaieval gpt-3.5-turbo test-match

# With sample limit and custom output
oaieval gpt-4 my-eval --max_samples 100 --record_path ./results.jsonl

# With extra eval parameters
oaieval gpt-3.5-turbo my-eval --extra_eval_params "num_few_shot=3,max_tokens=200"

# Debug mode
oaieval gpt-3.5-turbo test-match --debug

Programmatic Usage

import argparse
from evals.cli.oaieval import run, OaiEvalArguments

args = OaiEvalArguments()
args.completion_fn = "gpt-3.5-turbo"
args.eval = "test-match"
args.max_samples = 10
args.seed = 20220722
args.record_path = None
args.extra_eval_params = ""
args.completion_args = ""
args.cache = True
args.visible = None
args.debug = False
args.local_run = True
args.http_run = False
args.http_run_url = None
args.http_batch_size = 100
args.http_fail_percent_threshold = 5
args.dry_run = False
args.dry_run_logging = True
args.log_to_file = None
args.registry_path = None
args.user = ""

run_id = run(args)
print(f"Completed run: {run_id}")

Related Pages

Implements Principle

Principle:Openai_Evals_Eval_Orchestration

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment