Environment:Openai Evals Python Runtime
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, LLM_Evaluation |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
Python 3.9+ runtime environment with ~50 pip dependencies for running the OpenAI Evals framework.
Description
This environment defines the base Python runtime and all required pip packages for the OpenAI Evals framework. The project is distributed as a standard Python package installable via pip. It requires Python 3.9 or higher and includes dependencies for API communication (openai, anthropic, google-generativeai), data processing (pandas, numpy, datasets), evaluation metrics (sacrebleu, jiwer, nltk), and various utility libraries. An optional torch dependency is available for GPU-based evaluation tasks. An optional formatters group provides code formatting tools (black, isort, autoflake, ruff) for contributors.
Usage
Use this environment for all OpenAI Evals workflows. It is the mandatory prerequisite for running any eval via the `oaieval` or `oaievalset` CLI commands, building custom evals, and developing custom completion functions.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, or Windows | Any OS with Python 3.9+ support |
| Python | >= 3.9 | Stated in pyproject.toml `requires-python` |
| Disk | ~2GB free | For package installation and Git-LFS eval data |
| Network | Internet access required | For OpenAI API calls and downloading eval datasets |
Dependencies
System Packages
- `python` >= 3.9
- `git-lfs` (for fetching eval registry data files)
- `git` (for repository management)
Python Packages (Core)
- `openai` >= 1.0.0
- `anthropic`
- `google-generativeai`
- `beartype` >= 0.12.0
- `backoff`
- `aiolimiter`
- `blobfile`
- `dacite`
- `datasets`
- `docker`
- `evaluate`
- `filelock`
- `fire`
- `flask`
- `gymnasium`
- `langchain`
- `pydantic`
- `pyyaml`
- `tiktoken`
- `tqdm`
- `termcolor`
Python Packages (Data & Metrics)
- `numpy`
- `pandas`
- `matplotlib`
- `seaborn`
- `statsmodels`
- `sacrebleu`
- `jiwer`
- `nltk`
- `langdetect`
- `numexpr`
- `networkx`
- `spacy-universal-sentence-encoder`
Python Packages (Compression & Storage)
- `lz4`
- `zstandard`
- `snowflake-connector-python[pandas]`
Python Packages (Testing & Types)
- `pytest`
- `mock`
- `mypy`
- `types-PyYAML`
- `types-tqdm`
Optional Dependencies
- `torch` (optional, for GPU-based eval tasks)
- `black`, `isort`, `autoflake`, `ruff` (optional, for code formatting via `pip install -e ".[formatters]"`)
Credentials
The following environment variables must be set:
- `OPENAI_API_KEY`: Required. OpenAI API key for running evaluations against OpenAI models. Used in `evals/registry.py`, `evals/completion_fns/openai.py`, and multiple eval suites.
Optional credentials for extended functionality:
- `SNOWFLAKE_ACCOUNT`: Snowflake account identifier (for Snowflake logging backend).
- `SNOWFLAKE_DATABASE`: Snowflake database name.
- `SNOWFLAKE_USERNAME`: Snowflake login username.
- `SNOWFLAKE_PASSWORD`: Snowflake login password.
Quick Install
# Clone and install in development mode
git clone https://github.com/openai/evals.git
cd evals
git lfs fetch --all
git lfs pull
pip install -e .
# Optional: install formatters for contributing
pip install -e ".[formatters]"
pre-commit install
# Set required API key
export OPENAI_API_KEY="your-key-here"
Code Evidence
Python version requirement from `pyproject.toml:4`:
[project]
name = "evals"
version = "3.0.1.post1"
requires-python = ">=3.9"
OpenAI minimum version from `pyproject.toml:33`:
dependencies = [
...
"openai>=1.0.0",
...
]
CLI entry points from `pyproject.toml:61-62`:
[project.scripts]
oaieval = "evals.cli.oaieval:main"
oaievalset = "evals.cli.oaievalset:main"
Git-LFS requirement from `README.md:19-24`:
cd evals
git lfs fetch --all
git lfs pull
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ModuleNotFoundError: No module named 'evals'` | Package not installed | Run `pip install -e .` from the repo root |
| `openai.AuthenticationError` | Missing or invalid API key | Set `OPENAI_API_KEY` environment variable |
| Git-LFS pointer files instead of data | Git-LFS not installed or data not fetched | Install git-lfs, then run `git lfs fetch --all && git lfs pull` |
| `ImportError: No module named 'multiprocess'` | Python 3.10 compatibility issue in steganography/text_compression evals | Run `pip install multiprocess==0.70.15` after main install |
Compatibility Notes
- Python 3.10: Some evals (steganography, text_compression) may require an additional `pip install multiprocess==0.70.15` to work around a known compatibility issue.
- Windows: Docker-based evals (multistep_web_tasks) may require WSL2 or a Linux environment.
- torch: The torch dependency is optional. Install via `pip install -e ".[torch]"` only if running GPU-dependent eval tasks.