Environment:Openai Evals OpenAI API Configuration

Knowledge Sources	OpenAI Evals OpenAI API Keys
Domains	Infrastructure, API_Configuration
Last Updated	2026-02-14 10:00 GMT

Overview

OpenAI API key and runtime configuration environment for executing evaluations against OpenAI models.

Description

This environment defines the mandatory `OPENAI_API_KEY` credential and the suite of `EVALS_*` runtime configuration variables that control eval execution behavior. The OpenAI API key is loaded at module import time in `evals/registry.py` and is required for all evaluations that call OpenAI models. The `EVALS_*` variables control threading, timeouts, progress display, and sequential execution mode.

Usage

Use this environment whenever running evaluations against OpenAI models (e.g., gpt-3.5-turbo, gpt-4). The `OPENAI_API_KEY` is the single mandatory credential. The `EVALS_*` variables are optional but recommended for tuning performance and debugging.

System Requirements

Category	Requirement	Notes
Network	Internet access	Required for OpenAI API calls
API Account	OpenAI account with API access	See https://platform.openai.com/account/api-keys

Dependencies

Python Packages

`openai` >= 1.0.0

Credentials

The following environment variables must be set:

`OPENAI_API_KEY`: Required. OpenAI API key with read access. Used across `evals/registry.py:26`, `evals/completion_fns/openai.py`, `evals/completion_fns/retrieval.py`, and multiple eval suites.

Runtime Configuration Variables

`EVALS_THREADS`: Number of parallel threads for eval execution. Default: `10`. Used in `evals/eval.py:124`.
`EVALS_THREAD_TIMEOUT`: Timeout in seconds per thread before restart. Default: `40`. Used in `evals/utils/api_utils.py:6`.
`EVALS_SEQUENTIAL`: Set to `1`, `true`, or `yes` to run evals sequentially instead of in parallel. Default: `0`. Used in `evals/eval.py:140`.
`EVALS_SHOW_EVAL_PROGRESS`: Show progress bar during eval execution. Used in `evals/eval.py:125`.
`EVALS_GENTLE_INTERRUPT`: Enable gentle interrupt handling. Used in `evals/eval.py:242`.

Optional Snowflake Logging Credentials

`SNOWFLAKE_ACCOUNT`: Snowflake account identifier.
`SNOWFLAKE_DATABASE`: Snowflake database name.
`SNOWFLAKE_USERNAME`: Snowflake login username.
`SNOWFLAKE_PASSWORD`: Snowflake login password.

Quick Install

# Set required API key
export OPENAI_API_KEY="sk-..."

# Optional: configure threading for faster execution
export EVALS_THREADS=20
export EVALS_THREAD_TIMEOUT=120

# Optional: run sequentially for debugging
export EVALS_SEQUENTIAL=1

# Optional: Snowflake logging
export SNOWFLAKE_ACCOUNT="your-account"
export SNOWFLAKE_DATABASE="your-db"
export SNOWFLAKE_USERNAME="your-user"
export SNOWFLAKE_PASSWORD="your-password"

Code Evidence

OpenAI client initialization from `evals/registry.py:26`:

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

Thread configuration from `evals/eval.py:124-125`:

threads = int(os.environ.get("EVALS_THREADS", "10"))
show_progress = bool(os.environ.get("EVALS_SHOW_EVAL_PROGRESS", show_progress))

Sequential mode check from `evals/eval.py:140-144`:

if os.environ.get("EVALS_SEQUENTIAL", "0") in {"1", "true", "yes"}:
    logger.info("Running in sequential mode!")
    iter = map(eval_sample, work_items)
else:
    logger.info(f"Running in threaded mode with {threads} threads!")
    iter = pool.imap_unordered(eval_sample, work_items)

Thread timeout from `evals/utils/api_utils.py:6`:

EVALS_THREAD_TIMEOUT = float(os.environ.get("EVALS_THREAD_TIMEOUT", "40"))

Common Errors

Error Message	Cause	Solution
`openai.AuthenticationError: Incorrect API key`	Invalid or missing OPENAI_API_KEY	Verify your API key at https://platform.openai.com/account/api-keys
`openai.RateLimitError`	Too many requests	Reduce `EVALS_THREADS` or wait for rate limit reset
`openai.APITimeoutError`	Request exceeded timeout	Increase `EVALS_THREAD_TIMEOUT` (e.g., to 120 or 600)
`ValueError: human_cli player is available only with EVALS_SEQUENTIAL=1`	Human CLI solver requires sequential mode	Set `EVALS_SEQUENTIAL=1` before running

Compatibility Notes

Rate limits: Running with more threads increases throughput but may trigger OpenAI rate limits. Monitor your usage tier and adjust `EVALS_THREADS` accordingly.
Long prompts: For evals with long prompts or responses, increase `EVALS_THREAD_TIMEOUT` beyond the default 40 seconds.
Human-in-the-loop: Evals using `HumanCliSolver` (e.g., bluff eval) require `EVALS_SEQUENTIAL=1` since interactive CLI input cannot be parallelized.
Gemini solver: The Google Gemini solver has a known threading issue and tests force `EVALS_SEQUENTIAL=1` as a workaround (`evals/solvers/providers/google/gemini_solver_test.py:22`).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment