Environment:Openai Evals Python Runtime

Knowledge Sources	OpenAI Evals pyproject.toml
Domains	Infrastructure, LLM_Evaluation
Last Updated	2026-02-14 10:00 GMT

Overview

Python 3.9+ runtime environment with ~50 pip dependencies for running the OpenAI Evals framework.

Description

This environment defines the base Python runtime and all required pip packages for the OpenAI Evals framework. The project is distributed as a standard Python package installable via pip. It requires Python 3.9 or higher and includes dependencies for API communication (openai, anthropic, google-generativeai), data processing (pandas, numpy, datasets), evaluation metrics (sacrebleu, jiwer, nltk), and various utility libraries. An optional torch dependency is available for GPU-based evaluation tasks. An optional formatters group provides code formatting tools (black, isort, autoflake, ruff) for contributors.

Usage

Use this environment for all OpenAI Evals workflows. It is the mandatory prerequisite for running any eval via the `oaieval` or `oaievalset` CLI commands, building custom evals, and developing custom completion functions.

System Requirements

Category	Requirement	Notes
OS	Linux, macOS, or Windows	Any OS with Python 3.9+ support
Python	>= 3.9	Stated in pyproject.toml `requires-python`
Disk	~2GB free	For package installation and Git-LFS eval data
Network	Internet access required	For OpenAI API calls and downloading eval datasets

Dependencies

System Packages

`python` >= 3.9
`git-lfs` (for fetching eval registry data files)
`git` (for repository management)

Python Packages (Core)

`openai` >= 1.0.0
`anthropic`
`google-generativeai`
`beartype` >= 0.12.0
`backoff`
`aiolimiter`
`blobfile`
`dacite`
`datasets`
`docker`
`evaluate`
`filelock`
`fire`
`flask`
`gymnasium`
`langchain`
`pydantic`
`pyyaml`
`tiktoken`
`tqdm`
`termcolor`

Python Packages (Data & Metrics)

`numpy`
`pandas`
`matplotlib`
`seaborn`
`statsmodels`
`sacrebleu`
`jiwer`
`nltk`
`langdetect`
`numexpr`
`networkx`
`spacy-universal-sentence-encoder`

Python Packages (Compression & Storage)

`lz4`
`zstandard`
`snowflake-connector-python[pandas]`

Python Packages (Testing & Types)

`pytest`
`mock`
`mypy`
`types-PyYAML`
`types-tqdm`

Optional Dependencies

`torch` (optional, for GPU-based eval tasks)
`black`, `isort`, `autoflake`, `ruff` (optional, for code formatting via `pip install -e ".[formatters]"`)

Credentials

The following environment variables must be set:

`OPENAI_API_KEY`: Required. OpenAI API key for running evaluations against OpenAI models. Used in `evals/registry.py`, `evals/completion_fns/openai.py`, and multiple eval suites.

Optional credentials for extended functionality:

`SNOWFLAKE_ACCOUNT`: Snowflake account identifier (for Snowflake logging backend).
`SNOWFLAKE_DATABASE`: Snowflake database name.
`SNOWFLAKE_USERNAME`: Snowflake login username.
`SNOWFLAKE_PASSWORD`: Snowflake login password.

Quick Install

# Clone and install in development mode
git clone https://github.com/openai/evals.git
cd evals
git lfs fetch --all
git lfs pull
pip install -e .

# Optional: install formatters for contributing
pip install -e ".[formatters]"
pre-commit install

# Set required API key
export OPENAI_API_KEY="your-key-here"

Code Evidence

Python version requirement from `pyproject.toml:4`:

[project]
name = "evals"
version = "3.0.1.post1"
requires-python = ">=3.9"

OpenAI minimum version from `pyproject.toml:33`:

dependencies = [
    ...
    "openai>=1.0.0",
    ...
]

CLI entry points from `pyproject.toml:61-62`:

[project.scripts]
oaieval = "evals.cli.oaieval:main"
oaievalset = "evals.cli.oaievalset:main"

Git-LFS requirement from `README.md:19-24`:

cd evals
git lfs fetch --all
git lfs pull

Common Errors

Error Message	Cause	Solution
`ModuleNotFoundError: No module named 'evals'`	Package not installed	Run `pip install -e .` from the repo root
`openai.AuthenticationError`	Missing or invalid API key	Set `OPENAI_API_KEY` environment variable
Git-LFS pointer files instead of data	Git-LFS not installed or data not fetched	Install git-lfs, then run `git lfs fetch --all && git lfs pull`
`ImportError: No module named 'multiprocess'`	Python 3.10 compatibility issue in steganography/text_compression evals	Run `pip install multiprocess==0.70.15` after main install

Compatibility Notes

Python 3.10: Some evals (steganography, text_compression) may require an additional `pip install multiprocess==0.70.15` to work around a known compatibility issue.
Windows: Docker-based evals (multistep_web_tasks) may require WSL2 or a Linux environment.
torch: The torch dependency is optional. Install via `pip install -e ".[torch]"` only if running GPU-dependent eval tasks.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment