Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:EvolvingLMMs Lab Lmms eval API Credentials Environment

From Leeroopedia
Knowledge Sources
Domains Infrastructure, API_Integration
Last Updated 2026-02-14 00:00 GMT

Overview

API credentials and environment variables required for external service integrations including HuggingFace Hub, OpenAI GPT judges, W&B logging, and cloud model providers.

Description

This environment defines all API keys, tokens, and configuration environment variables used by the lmms-eval framework. These are needed for: (1) accessing gated models and datasets on HuggingFace Hub, (2) using OpenAI or Azure GPT models as evaluation judges, (3) logging evaluation results to Weights & Biases, and (4) running evaluations against cloud-hosted model APIs (Gemini, Reka, xAI Grok, etc.). The framework uses python-dotenv to load credentials from .env files.

Usage

Use this environment when running evaluations that require external API access. This includes tasks using GPT-based judges (e.g., ActivityNetQA, VDC, WildVisionBench), logging to W&B, pushing results to HuggingFace Hub, or evaluating cloud-hosted models.

System Requirements

Category Requirement Notes
Network Internet access Required for API calls to external services
Credentials API keys in environment or .env file Never commit actual keys to source control

Dependencies

Python Packages

  • python-dotenvLoads .env files automatically
  • openaiOpenAI API client for GPT judges
  • wandb >= 0.16.0 — Weights & Biases logging
  • httpx >= 0.23.3 — HTTP client for API providers

Credentials

NEVER store actual secret values in code or documentation. The following environment variables must be set:

HuggingFace:

  • HF_TOKEN: HuggingFace API token for accessing gated models/datasets and pushing results to Hub.

OpenAI / Azure:

  • OPENAI_API_KEY: OpenAI API key for GPT-based evaluation judges and direct model evaluation.
  • AZURE_API_KEY: Azure OpenAI API key (alternative to direct OpenAI).

Logging:

  • WANDB_API_KEY: Weights & Biases API key for experiment logging.

Task-Specific:

  • VIESCORE_API_KEY: VIEScore API key for GEdit-Bench image editing evaluation.

Framework Configuration:

  • LMMS_EVAL_HOME: Cache directory path (default: ~/.cache/lmms-eval).
  • LMMS_EVAL_USE_CACHE: Enable response caching (default: "False").
  • LMMS_EVAL_PLUGINS: Comma-separated list of plugin package paths.
  • LMMS_EVAL_SHUFFLE_DOCS: Shuffle dataset documents (used by some tasks).
  • VERBOSITY: Logging level (DEBUG, INFO, WARNING, ERROR).

Quick Install

# Create a .env file (never commit this!)
cat > .env << 'EOF'
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxx
WANDB_API_KEY=xxxxxxxxxxxxxxxxxxxx
EOF

# Or export directly
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx
export OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxx

Code Evidence

HF_TOKEN usage from lmms_eval/__main__.py:557-558:

if os.environ.get("HF_TOKEN", None):
    args.hf_hub_log_args += f",token={os.environ.get('HF_TOKEN')}"

Cache directory configuration from lmms_eval/api/model.py:22-23:

LMMS_EVAL_HOME = os.path.expanduser(os.getenv("LMMS_EVAL_HOME", "~/.cache/lmms-eval"))
LMMS_EVAL_USE_CACHE = os.getenv("LMMS_EVAL_USE_CACHE", "False")

Plugin loading from lmms_eval/__main__.py:595-599:

if os.environ.get("LMMS_EVAL_PLUGINS", None):
    args.include_path = [args.include_path] if args.include_path else []
    for plugin in os.environ["LMMS_EVAL_PLUGINS"].split(","):
        package_tasks_location = importlib.util.find_spec(f"{plugin}.tasks").submodule_search_locations[0]
        args.include_path.append(package_tasks_location)

Dotenv loading in model providers from lmms_eval/models/chat/async_openai.py:20,27:

from dotenv import load_dotenv
load_dotenv(verbose=True)

LLM judge retry configuration from lmms_eval/llm_judge/protocol.py:4-19:

DEFAULT_NUM_RETRIES = 5
DEFAULT_RETRY_DELAY = 10  # seconds

Common Errors

Error Message Cause Solution
401 Unauthorized from HuggingFace Missing or invalid HF_TOKEN Set HF_TOKEN with a valid token from huggingface.co/settings/tokens
openai.AuthenticationError Missing OPENAI_API_KEY Set OPENAI_API_KEY environment variable
GPT judge returns empty response API rate limiting Framework retries 5 times with 10s delay; increase API tier if persistent
wandb: ERROR login failure Missing WANDB_API_KEY Run wandb login or set WANDB_API_KEY

Compatibility Notes

  • dotenv files: The framework loads .env files automatically via python-dotenv. Place the file in the working directory.
  • GPT judges: Many task benchmarks (ActivityNetQA, VDC, WildVisionBench, etc.) require OPENAI_API_KEY for GPT-based evaluation. Without it, these tasks will fail.
  • W&B logging: Optional. Only required when using --wandb_args CLI flag.
  • HF Hub push: Only required when using push_results_to_hub or push_samples_to_hub in evaluation tracker args.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment