Environment:Haotian liu LLaVA OpenAI API Evaluation Environment

Knowledge Sources	Haotian-liu/LLaVA
Domains	Evaluation, NLP
Last Updated	2026-02-13 23:00 GMT

Overview

Python environment with OpenAI API access and Ray for parallel GPT-4-based evaluation scoring of LLaVA model outputs.

Description

This environment provides the additional dependencies required for GPT-4-based evaluation workflows. Beyond the core LLaVA training environment, it requires the `openai` Python client for ChatCompletion API calls and `ray` for parallelizing evaluation requests across multiple CPUs. The evaluation scripts use GPT-4 as a judge to score model-generated answers against reference answers.

Usage

Use this environment when running the Benchmark Evaluation workflow steps that involve GPT-4 review scoring: `eval_gpt_review.py`, `eval_gpt_review_bench.py`, `eval_gpt_review_visual.py`, and `eval_science_qa_gpt4.py`. These scripts call the OpenAI API and require a valid API key with GPT-4 access.

System Requirements

Category	Requirement	Notes
OS	Any (Linux, macOS, Windows)	CPU-only; no GPU required for evaluation scoring
Network	Internet access	Required for OpenAI API calls
API Access	OpenAI API with GPT-4 model access	Rate limits apply; scripts include retry with sleep

Dependencies

Python Packages

`openai` (ChatCompletion API client)
`ray` (distributed task execution, used with `@ray.remote(num_cpus=4)`)
`tqdm` (progress bars)

Credentials

The following environment variables must be set:

`OPENAI_API_KEY`: OpenAI API key with access to the `gpt-4` model. Required for all GPT-4 evaluation scripts.

Quick Install

pip install openai ray tqdm
export OPENAI_API_KEY="your-api-key-here"

Code Evidence

OpenAI API usage from `eval_gpt_review.py:12-36`:

@ray.remote(num_cpus=4)
def get_eval(content: str, max_tokens: int):
    while True:
        try:
            response = openai.ChatCompletion.create(
                model='gpt-4',
                messages=[{
                    'role': 'system',
                    'content': 'You are a helpful and precise assistant for checking the quality of the answer.'
                }, {
                    'role': 'user',
                    'content': content,
                }],
                temperature=0.2,
                max_tokens=max_tokens,
            )
            break
        except openai.error.RateLimitError:
            pass
        except Exception as e:
            print(e)
        time.sleep(NUM_SECONDS_TO_SLEEP)

Rate limit handling from `eval_gpt_review.py:10`:

NUM_SECONDS_TO_SLEEP = 3

Common Errors

Error Message	Cause	Solution
`openai.error.AuthenticationError`	Invalid or missing `OPENAI_API_KEY`	Set valid API key: `export OPENAI_API_KEY="sk-..."`
`openai.error.RateLimitError`	Too many API requests	Script handles this automatically with 3-second retry sleep
`ray.exceptions.RaySystemError`	Ray not initialized	Ensure `ray.init()` is called (done in script `__main__`)
Model `gpt-4` not available	API key lacks GPT-4 access	Upgrade OpenAI API plan or request GPT-4 access

Compatibility Notes

API Version: The evaluation scripts use the legacy `openai.ChatCompletion.create()` API (pre-v1.0 openai package). If using `openai>=1.0`, the scripts need modification.
Cost: Each evaluation run makes one GPT-4 API call per question. Large evaluation sets can incur significant costs.
Parallelism: Ray is configured with `num_cpus=4` per evaluation task. Adjust based on available CPU cores and API rate limits.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment