Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Open compass VLMEvalKit Data Storage Environment

From Leeroopedia
Knowledge Sources
Domains Infrastructure, Data_Management
Last Updated 2026-02-14 01:30 GMT

Overview

Data storage layout for VLMEvalKit covering dataset caching (`LMUData`), HuggingFace model cache, and evaluation output directories.

Description

VLMEvalKit uses three primary storage locations: (1) LMUData root for dataset TSV files and decoded images, configurable via the `LMUData` environment variable or defaulting to `~/LMUData`; (2) HuggingFace cache for downloaded model weights, following the standard `~/.cache/huggingface/hub` path or `HUGGINGFACE_HUB_CACHE`/`HF_HOME` overrides; and (3) Output directory for predictions and evaluation results, defaulting to `./outputs` or overridden by `MMEVAL_ROOT`. The framework decodes base64-encoded images from TSV files to local disk in a parallelized fashion using a 32-worker multiprocessing pool.

Usage

Use this environment when running any evaluation workflow. Datasets are automatically downloaded and cached on first use. Ensure sufficient disk space for the target datasets, especially multi-image and video benchmarks which can be large.

System Requirements

Category Requirement Notes
Disk (Datasets) 10GB+ SSD recommended Varies greatly by benchmark; video datasets can be 100GB+
Disk (Models) 5-100GB per model Depends on model size (7B ~ 14GB, 70B ~ 140GB)
Disk (Output) 1GB+ Prediction files in xlsx/tsv/json format
I/O High IOPS recommended Image decoding uses 32-process parallelism (`vlmeval/smp/file.py:60`)

Dependencies

Python Packages

  • `huggingface_hub` (model/dataset downloading and cache management)
  • `pandas` (TSV/xlsx file I/O)
  • `pillow` (image decoding)
  • `openpyxl` (xlsx output format)
  • `xlsxwriter` (xlsx writing)

Credentials

The following environment variables configure storage locations:

  • `LMUData`: Root directory for dataset files. Default: `~/LMUData` (`vlmeval/smp/file.py:70-75`).
  • `HUGGINGFACE_HUB_CACHE`: HuggingFace model cache directory (`vlmeval/smp/file.py:79`).
  • `HF_HOME`: Alternative HuggingFace home directory (`vlmeval/smp/file.py:79`).
  • `MMEVAL_ROOT`: Override for evaluation output directory. Default: `./outputs` (`run.py:221-222`).
  • `PRED_FORMAT`: Prediction output format (`tsv`, `xlsx`, `json`). Default: `xlsx` (`vlmeval/smp/file.py:174`).
  • `EVAL_FORMAT`: Evaluation result format (`csv`, `json`). Default: `csv` (`vlmeval/smp/file.py:183`).

Quick Install

# Create default data directory
mkdir -p ~/LMUData

# Or set custom location
export LMUData=/data/vlmeval/datasets

# Set custom output directory
export MMEVAL_ROOT=/data/vlmeval/outputs

Code Evidence

LMUData root resolution from `vlmeval/smp/file.py:69-75`:

def LMUDataRoot():
    if 'LMUData' in os.environ and osp.exists(os.environ['LMUData']):
        return os.environ['LMUData']
    home = osp.expanduser('~')
    root = osp.join(home, 'LMUData')
    os.makedirs(root, exist_ok=True)
    return root

HuggingFace cache resolution from `vlmeval/smp/file.py:78-89`:

def HFCacheRoot():
    cache_list = ['HUGGINGFACE_HUB_CACHE', 'HF_HOME']
    for cache_name in cache_list:
        if cache_name in os.environ and osp.exists(os.environ[cache_name]):
            if os.environ[cache_name].split('/')[-1] == 'hub':
                return os.environ[cache_name]
            else:
                return osp.join(os.environ[cache_name], 'hub')
    home = osp.expanduser('~')
    root = osp.join(home, '.cache', 'huggingface', 'hub')
    os.makedirs(root, exist_ok=True)
    return root

MMEVAL_ROOT override from `run.py:221-222`:

if 'MMEVAL_ROOT' in os.environ:
    args.work_dir = os.environ['MMEVAL_ROOT']

Parallel image decoding from `vlmeval/smp/file.py:60-61`:

pool = mp.Pool(32)
ret = pool.map(decode_img_omni, tups)

Prediction format selection from `vlmeval/smp/file.py:173-179`:

def get_pred_file_format():
    pred_format = os.getenv('PRED_FORMAT', '').lower()
    if pred_format == '':
        return 'xlsx'  # default format
    else:
        assert pred_format in ['tsv', 'xlsx', 'json'], f'Unsupported PRED_FORMAT {pred_format}'
        return pred_format

Common Errors

Error Message Cause Solution
`OSError: [Errno 28] No space left on device` Disk full Free space or change `LMUData`/`MMEVAL_ROOT` to a larger partition
`FileNotFoundError: ~/LMUData/...` Dataset not downloaded Re-run evaluation; datasets auto-download on first access
`PermissionError` on cache directory No write permission Check permissions on `LMUData` and HF cache directories

Compatibility Notes

  • Symlinks for results: After evaluation, prediction files are symlinked from the timestamped output directory to the model root directory for easy access (`run.py:483-491`).
  • Fallback to pickle: If output format serialization fails (e.g., xlsx write error), the framework automatically falls back to pickle format (`vlmeval/smp/file.py:167-170`).
  • ModelScope alternative: When `VLMEVALKIT_USE_MODELSCOPE=1`, dataset downloads use ModelScope cache instead of HuggingFace (`vlmeval/smp/misc.py:83-88`).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment