Environment:Open compass VLMEvalKit Data Storage Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Data_Management |
| Last Updated | 2026-02-14 01:30 GMT |
Overview
Data storage layout for VLMEvalKit covering dataset caching (`LMUData`), HuggingFace model cache, and evaluation output directories.
Description
VLMEvalKit uses three primary storage locations: (1) LMUData root for dataset TSV files and decoded images, configurable via the `LMUData` environment variable or defaulting to `~/LMUData`; (2) HuggingFace cache for downloaded model weights, following the standard `~/.cache/huggingface/hub` path or `HUGGINGFACE_HUB_CACHE`/`HF_HOME` overrides; and (3) Output directory for predictions and evaluation results, defaulting to `./outputs` or overridden by `MMEVAL_ROOT`. The framework decodes base64-encoded images from TSV files to local disk in a parallelized fashion using a 32-worker multiprocessing pool.
Usage
Use this environment when running any evaluation workflow. Datasets are automatically downloaded and cached on first use. Ensure sufficient disk space for the target datasets, especially multi-image and video benchmarks which can be large.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Disk (Datasets) | 10GB+ SSD recommended | Varies greatly by benchmark; video datasets can be 100GB+ |
| Disk (Models) | 5-100GB per model | Depends on model size (7B ~ 14GB, 70B ~ 140GB) |
| Disk (Output) | 1GB+ | Prediction files in xlsx/tsv/json format |
| I/O | High IOPS recommended | Image decoding uses 32-process parallelism (`vlmeval/smp/file.py:60`) |
Dependencies
Python Packages
- `huggingface_hub` (model/dataset downloading and cache management)
- `pandas` (TSV/xlsx file I/O)
- `pillow` (image decoding)
- `openpyxl` (xlsx output format)
- `xlsxwriter` (xlsx writing)
Credentials
The following environment variables configure storage locations:
- `LMUData`: Root directory for dataset files. Default: `~/LMUData` (`vlmeval/smp/file.py:70-75`).
- `HUGGINGFACE_HUB_CACHE`: HuggingFace model cache directory (`vlmeval/smp/file.py:79`).
- `HF_HOME`: Alternative HuggingFace home directory (`vlmeval/smp/file.py:79`).
- `MMEVAL_ROOT`: Override for evaluation output directory. Default: `./outputs` (`run.py:221-222`).
- `PRED_FORMAT`: Prediction output format (`tsv`, `xlsx`, `json`). Default: `xlsx` (`vlmeval/smp/file.py:174`).
- `EVAL_FORMAT`: Evaluation result format (`csv`, `json`). Default: `csv` (`vlmeval/smp/file.py:183`).
Quick Install
# Create default data directory
mkdir -p ~/LMUData
# Or set custom location
export LMUData=/data/vlmeval/datasets
# Set custom output directory
export MMEVAL_ROOT=/data/vlmeval/outputs
Code Evidence
LMUData root resolution from `vlmeval/smp/file.py:69-75`:
def LMUDataRoot():
if 'LMUData' in os.environ and osp.exists(os.environ['LMUData']):
return os.environ['LMUData']
home = osp.expanduser('~')
root = osp.join(home, 'LMUData')
os.makedirs(root, exist_ok=True)
return root
HuggingFace cache resolution from `vlmeval/smp/file.py:78-89`:
def HFCacheRoot():
cache_list = ['HUGGINGFACE_HUB_CACHE', 'HF_HOME']
for cache_name in cache_list:
if cache_name in os.environ and osp.exists(os.environ[cache_name]):
if os.environ[cache_name].split('/')[-1] == 'hub':
return os.environ[cache_name]
else:
return osp.join(os.environ[cache_name], 'hub')
home = osp.expanduser('~')
root = osp.join(home, '.cache', 'huggingface', 'hub')
os.makedirs(root, exist_ok=True)
return root
MMEVAL_ROOT override from `run.py:221-222`:
if 'MMEVAL_ROOT' in os.environ:
args.work_dir = os.environ['MMEVAL_ROOT']
Parallel image decoding from `vlmeval/smp/file.py:60-61`:
pool = mp.Pool(32)
ret = pool.map(decode_img_omni, tups)
Prediction format selection from `vlmeval/smp/file.py:173-179`:
def get_pred_file_format():
pred_format = os.getenv('PRED_FORMAT', '').lower()
if pred_format == '':
return 'xlsx' # default format
else:
assert pred_format in ['tsv', 'xlsx', 'json'], f'Unsupported PRED_FORMAT {pred_format}'
return pred_format
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `OSError: [Errno 28] No space left on device` | Disk full | Free space or change `LMUData`/`MMEVAL_ROOT` to a larger partition |
| `FileNotFoundError: ~/LMUData/...` | Dataset not downloaded | Re-run evaluation; datasets auto-download on first access |
| `PermissionError` on cache directory | No write permission | Check permissions on `LMUData` and HF cache directories |
Compatibility Notes
- Symlinks for results: After evaluation, prediction files are symlinked from the timestamped output directory to the model root directory for easy access (`run.py:483-491`).
- Fallback to pickle: If output format serialization fails (e.g., xlsx write error), the framework automatically falls back to pickle format (`vlmeval/smp/file.py:167-170`).
- ModelScope alternative: When `VLMEVALKIT_USE_MODELSCOPE=1`, dataset downloads use ModelScope cache instead of HuggingFace (`vlmeval/smp/misc.py:83-88`).