Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Openai Evals Oaieval Run

From Leeroopedia
Knowledge Sources
Domains Evaluation, Orchestration
Last Updated 2026-02-14 10:00 GMT

Overview

Concrete tool for orchestrating a complete evaluation run provided by the oaieval CLI module.

Description

The oaieval.run function is the primary orchestration entry point. It resolves the completion function and eval from their names, constructs a RunSpec with a unique run_id, builds the appropriate recorder backend (Local, HTTP, Snowflake, or Dummy), instantiates the eval class, executes it, appends token usage to results, and records the final report. The function returns the unique run_id string.

Usage

Use this function when running a single evaluation programmatically or via the oaieval CLI. This is invoked by oaieval.main() after argument parsing and is also called by oaievalset for each eval in a set.

Code Reference

Source Location

  • Repository: openai/evals
  • File: evals/cli/oaieval.py (lines 118-239)

Signature

def run(args: OaiEvalArguments, registry: Optional[Registry] = None) -> str:
    """
    Run a single evaluation.

    Args:
        args: Parsed CLI arguments including completion_fn name, eval name,
              max_samples, seed, record_path, and other configuration.
        registry: Optional Registry instance. If None, creates a new one.

    Returns:
        run_id: Unique identifier string for this evaluation run.
    """

Import

from evals.cli.oaieval import run, OaiEvalArguments

I/O Contract

Inputs

Name Type Required Description
args.completion_fn str Yes Model ID or registered completion function name
args.eval str Yes Eval name from registry
args.max_samples Optional[int] No Limit number of samples evaluated
args.seed int No Random seed (default: 20220722)
args.record_path Optional[str] No Custom path for log output (default: /tmp/evallogs/)
args.extra_eval_params str No Comma-separated key=value overrides for eval args
args.debug bool No Enable debug logging
registry Optional[Registry] No Pre-configured Registry instance

Outputs

Name Type Description
run_id str Unique run identifier (format: YYMMDDHHMMSSxxxxx)
JSON log file File JSONL log at record_path with all events and final report
Console output Text Final report metrics printed to console

Usage Examples

CLI Usage

# Basic evaluation
oaieval gpt-3.5-turbo test-match

# With sample limit and custom output
oaieval gpt-4 my-eval --max_samples 100 --record_path ./results.jsonl

# With extra eval parameters
oaieval gpt-3.5-turbo my-eval --extra_eval_params "num_few_shot=3,max_tokens=200"

# Debug mode
oaieval gpt-3.5-turbo test-match --debug

Programmatic Usage

import argparse
from evals.cli.oaieval import run, OaiEvalArguments

args = OaiEvalArguments()
args.completion_fn = "gpt-3.5-turbo"
args.eval = "test-match"
args.max_samples = 10
args.seed = 20220722
args.record_path = None
args.extra_eval_params = ""
args.completion_args = ""
args.cache = True
args.visible = None
args.debug = False
args.local_run = True
args.http_run = False
args.http_run_url = None
args.http_batch_size = 100
args.http_fail_percent_threshold = 5
args.dry_run = False
args.dry_run_logging = True
args.log_to_file = None
args.registry_path = None
args.user = ""

run_id = run(args)
print(f"Completed run: {run_id}")

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment