Implementation:Openai Evals Oaieval Run
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Orchestration |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
Concrete tool for orchestrating a complete evaluation run provided by the oaieval CLI module.
Description
The oaieval.run function is the primary orchestration entry point. It resolves the completion function and eval from their names, constructs a RunSpec with a unique run_id, builds the appropriate recorder backend (Local, HTTP, Snowflake, or Dummy), instantiates the eval class, executes it, appends token usage to results, and records the final report. The function returns the unique run_id string.
Usage
Use this function when running a single evaluation programmatically or via the oaieval CLI. This is invoked by oaieval.main() after argument parsing and is also called by oaievalset for each eval in a set.
Code Reference
Source Location
- Repository: openai/evals
- File: evals/cli/oaieval.py (lines 118-239)
Signature
def run(args: OaiEvalArguments, registry: Optional[Registry] = None) -> str:
"""
Run a single evaluation.
Args:
args: Parsed CLI arguments including completion_fn name, eval name,
max_samples, seed, record_path, and other configuration.
registry: Optional Registry instance. If None, creates a new one.
Returns:
run_id: Unique identifier string for this evaluation run.
"""
Import
from evals.cli.oaieval import run, OaiEvalArguments
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| args.completion_fn | str | Yes | Model ID or registered completion function name |
| args.eval | str | Yes | Eval name from registry |
| args.max_samples | Optional[int] | No | Limit number of samples evaluated |
| args.seed | int | No | Random seed (default: 20220722) |
| args.record_path | Optional[str] | No | Custom path for log output (default: /tmp/evallogs/) |
| args.extra_eval_params | str | No | Comma-separated key=value overrides for eval args |
| args.debug | bool | No | Enable debug logging |
| registry | Optional[Registry] | No | Pre-configured Registry instance |
Outputs
| Name | Type | Description |
|---|---|---|
| run_id | str | Unique run identifier (format: YYMMDDHHMMSSxxxxx) |
| JSON log file | File | JSONL log at record_path with all events and final report |
| Console output | Text | Final report metrics printed to console |
Usage Examples
CLI Usage
# Basic evaluation
oaieval gpt-3.5-turbo test-match
# With sample limit and custom output
oaieval gpt-4 my-eval --max_samples 100 --record_path ./results.jsonl
# With extra eval parameters
oaieval gpt-3.5-turbo my-eval --extra_eval_params "num_few_shot=3,max_tokens=200"
# Debug mode
oaieval gpt-3.5-turbo test-match --debug
Programmatic Usage
import argparse
from evals.cli.oaieval import run, OaiEvalArguments
args = OaiEvalArguments()
args.completion_fn = "gpt-3.5-turbo"
args.eval = "test-match"
args.max_samples = 10
args.seed = 20220722
args.record_path = None
args.extra_eval_params = ""
args.completion_args = ""
args.cache = True
args.visible = None
args.debug = False
args.local_run = True
args.http_run = False
args.http_run_url = None
args.http_batch_size = 100
args.http_fail_percent_threshold = 5
args.dry_run = False
args.dry_run_logging = True
args.log_to_file = None
args.registry_path = None
args.user = ""
run_id = run(args)
print(f"Completed run: {run_id}")