Environment:Openai Evals OpenAI API Configuration
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, API_Configuration |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
OpenAI API key and runtime configuration environment for executing evaluations against OpenAI models.
Description
This environment defines the mandatory `OPENAI_API_KEY` credential and the suite of `EVALS_*` runtime configuration variables that control eval execution behavior. The OpenAI API key is loaded at module import time in `evals/registry.py` and is required for all evaluations that call OpenAI models. The `EVALS_*` variables control threading, timeouts, progress display, and sequential execution mode.
Usage
Use this environment whenever running evaluations against OpenAI models (e.g., gpt-3.5-turbo, gpt-4). The `OPENAI_API_KEY` is the single mandatory credential. The `EVALS_*` variables are optional but recommended for tuning performance and debugging.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Network | Internet access | Required for OpenAI API calls |
| API Account | OpenAI account with API access | See https://platform.openai.com/account/api-keys |
Dependencies
Python Packages
- `openai` >= 1.0.0
Credentials
The following environment variables must be set:
- `OPENAI_API_KEY`: Required. OpenAI API key with read access. Used across `evals/registry.py:26`, `evals/completion_fns/openai.py`, `evals/completion_fns/retrieval.py`, and multiple eval suites.
Runtime Configuration Variables
- `EVALS_THREADS`: Number of parallel threads for eval execution. Default: `10`. Used in `evals/eval.py:124`.
- `EVALS_THREAD_TIMEOUT`: Timeout in seconds per thread before restart. Default: `40`. Used in `evals/utils/api_utils.py:6`.
- `EVALS_SEQUENTIAL`: Set to `1`, `true`, or `yes` to run evals sequentially instead of in parallel. Default: `0`. Used in `evals/eval.py:140`.
- `EVALS_SHOW_EVAL_PROGRESS`: Show progress bar during eval execution. Used in `evals/eval.py:125`.
- `EVALS_GENTLE_INTERRUPT`: Enable gentle interrupt handling. Used in `evals/eval.py:242`.
Optional Snowflake Logging Credentials
- `SNOWFLAKE_ACCOUNT`: Snowflake account identifier.
- `SNOWFLAKE_DATABASE`: Snowflake database name.
- `SNOWFLAKE_USERNAME`: Snowflake login username.
- `SNOWFLAKE_PASSWORD`: Snowflake login password.
Quick Install
# Set required API key
export OPENAI_API_KEY="sk-..."
# Optional: configure threading for faster execution
export EVALS_THREADS=20
export EVALS_THREAD_TIMEOUT=120
# Optional: run sequentially for debugging
export EVALS_SEQUENTIAL=1
# Optional: Snowflake logging
export SNOWFLAKE_ACCOUNT="your-account"
export SNOWFLAKE_DATABASE="your-db"
export SNOWFLAKE_USERNAME="your-user"
export SNOWFLAKE_PASSWORD="your-password"
Code Evidence
OpenAI client initialization from `evals/registry.py:26`:
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
Thread configuration from `evals/eval.py:124-125`:
threads = int(os.environ.get("EVALS_THREADS", "10"))
show_progress = bool(os.environ.get("EVALS_SHOW_EVAL_PROGRESS", show_progress))
Sequential mode check from `evals/eval.py:140-144`:
if os.environ.get("EVALS_SEQUENTIAL", "0") in {"1", "true", "yes"}:
logger.info("Running in sequential mode!")
iter = map(eval_sample, work_items)
else:
logger.info(f"Running in threaded mode with {threads} threads!")
iter = pool.imap_unordered(eval_sample, work_items)
Thread timeout from `evals/utils/api_utils.py:6`:
EVALS_THREAD_TIMEOUT = float(os.environ.get("EVALS_THREAD_TIMEOUT", "40"))
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `openai.AuthenticationError: Incorrect API key` | Invalid or missing OPENAI_API_KEY | Verify your API key at https://platform.openai.com/account/api-keys |
| `openai.RateLimitError` | Too many requests | Reduce `EVALS_THREADS` or wait for rate limit reset |
| `openai.APITimeoutError` | Request exceeded timeout | Increase `EVALS_THREAD_TIMEOUT` (e.g., to 120 or 600) |
| `ValueError: human_cli player is available only with EVALS_SEQUENTIAL=1` | Human CLI solver requires sequential mode | Set `EVALS_SEQUENTIAL=1` before running |
Compatibility Notes
- Rate limits: Running with more threads increases throughput but may trigger OpenAI rate limits. Monitor your usage tier and adjust `EVALS_THREADS` accordingly.
- Long prompts: For evals with long prompts or responses, increase `EVALS_THREAD_TIMEOUT` beyond the default 40 seconds.
- Human-in-the-loop: Evals using `HumanCliSolver` (e.g., bluff eval) require `EVALS_SEQUENTIAL=1` since interactive CLI input cannot be parallelized.
- Gemini solver: The Google Gemini solver has a known threading issue and tests force `EVALS_SEQUENTIAL=1` as a workaround (`evals/solvers/providers/google/gemini_solver_test.py:22`).