Environment:EvolvingLMMs Lab Lmms eval API Credentials Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, API_Integration |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
API credentials and environment variables required for external service integrations including HuggingFace Hub, OpenAI GPT judges, W&B logging, and cloud model providers.
Description
This environment defines all API keys, tokens, and configuration environment variables used by the lmms-eval framework. These are needed for: (1) accessing gated models and datasets on HuggingFace Hub, (2) using OpenAI or Azure GPT models as evaluation judges, (3) logging evaluation results to Weights & Biases, and (4) running evaluations against cloud-hosted model APIs (Gemini, Reka, xAI Grok, etc.). The framework uses python-dotenv to load credentials from .env files.
Usage
Use this environment when running evaluations that require external API access. This includes tasks using GPT-based judges (e.g., ActivityNetQA, VDC, WildVisionBench), logging to W&B, pushing results to HuggingFace Hub, or evaluating cloud-hosted models.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Network | Internet access | Required for API calls to external services |
| Credentials | API keys in environment or .env file | Never commit actual keys to source control |
Dependencies
Python Packages
python-dotenv— Loads .env files automaticallyopenai— OpenAI API client for GPT judgeswandb>= 0.16.0 — Weights & Biases logginghttpx>= 0.23.3 — HTTP client for API providers
Credentials
NEVER store actual secret values in code or documentation. The following environment variables must be set:
HuggingFace:
HF_TOKEN: HuggingFace API token for accessing gated models/datasets and pushing results to Hub.
OpenAI / Azure:
OPENAI_API_KEY: OpenAI API key for GPT-based evaluation judges and direct model evaluation.AZURE_API_KEY: Azure OpenAI API key (alternative to direct OpenAI).
Logging:
WANDB_API_KEY: Weights & Biases API key for experiment logging.
Task-Specific:
VIESCORE_API_KEY: VIEScore API key for GEdit-Bench image editing evaluation.
Framework Configuration:
LMMS_EVAL_HOME: Cache directory path (default:~/.cache/lmms-eval).LMMS_EVAL_USE_CACHE: Enable response caching (default:"False").LMMS_EVAL_PLUGINS: Comma-separated list of plugin package paths.LMMS_EVAL_SHUFFLE_DOCS: Shuffle dataset documents (used by some tasks).VERBOSITY: Logging level (DEBUG, INFO, WARNING, ERROR).
Quick Install
# Create a .env file (never commit this!)
cat > .env << 'EOF'
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxx
WANDB_API_KEY=xxxxxxxxxxxxxxxxxxxx
EOF
# Or export directly
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx
export OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxx
Code Evidence
HF_TOKEN usage from lmms_eval/__main__.py:557-558:
if os.environ.get("HF_TOKEN", None):
args.hf_hub_log_args += f",token={os.environ.get('HF_TOKEN')}"
Cache directory configuration from lmms_eval/api/model.py:22-23:
LMMS_EVAL_HOME = os.path.expanduser(os.getenv("LMMS_EVAL_HOME", "~/.cache/lmms-eval"))
LMMS_EVAL_USE_CACHE = os.getenv("LMMS_EVAL_USE_CACHE", "False")
Plugin loading from lmms_eval/__main__.py:595-599:
if os.environ.get("LMMS_EVAL_PLUGINS", None):
args.include_path = [args.include_path] if args.include_path else []
for plugin in os.environ["LMMS_EVAL_PLUGINS"].split(","):
package_tasks_location = importlib.util.find_spec(f"{plugin}.tasks").submodule_search_locations[0]
args.include_path.append(package_tasks_location)
Dotenv loading in model providers from lmms_eval/models/chat/async_openai.py:20,27:
from dotenv import load_dotenv
load_dotenv(verbose=True)
LLM judge retry configuration from lmms_eval/llm_judge/protocol.py:4-19:
DEFAULT_NUM_RETRIES = 5
DEFAULT_RETRY_DELAY = 10 # seconds
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
401 Unauthorized from HuggingFace |
Missing or invalid HF_TOKEN | Set HF_TOKEN with a valid token from huggingface.co/settings/tokens
|
openai.AuthenticationError |
Missing OPENAI_API_KEY | Set OPENAI_API_KEY environment variable
|
| GPT judge returns empty response | API rate limiting | Framework retries 5 times with 10s delay; increase API tier if persistent |
wandb: ERROR login failure |
Missing WANDB_API_KEY | Run wandb login or set WANDB_API_KEY
|
Compatibility Notes
- dotenv files: The framework loads
.envfiles automatically viapython-dotenv. Place the file in the working directory. - GPT judges: Many task benchmarks (ActivityNetQA, VDC, WildVisionBench, etc.) require
OPENAI_API_KEYfor GPT-based evaluation. Without it, these tasks will fail. - W&B logging: Optional. Only required when using
--wandb_argsCLI flag. - HF Hub push: Only required when using
push_results_to_huborpush_samples_to_hubin evaluation tracker args.