Implementation:Open compass VLMEvalKit ScreenSpot
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, Benchmarking, GUI Grounding |
Overview
Benchmark dataset implementation for ScreenSpot GUI element grounding evaluation in VLMEvalKit.
Description
ScreenSpot inherits from ImageBaseDataset and implements the ScreenSpot benchmark for GUI agent evaluation, testing the ability to locate UI elements on Mobile, Desktop, and Web screenshots. The TYPE field is set to 'GUI'. It supports both point-based and rectangle-based evaluation with functional or composite referring expressions, and provides multiple dataset splits (ScreenSpot_Mobile, ScreenSpot_Desktop, ScreenSpot_Web, plus v2 variants).
Usage
Registered in vlmeval/dataset/__init__.py and invoked through build_dataset() by benchmark name.
Code Reference
- Source:
vlmeval/dataset/GUI/screenspot.py, Lines: L1-461 - Import:
from vlmeval.dataset.GUI.screenspot import ScreenSpot
Signature:
class ScreenSpot(ImageBaseDataset):
MODALITY = "IMAGE"
TYPE = "GUI"
DATASET_URL = {...}
DATASET_MD5 = {...}
EVAL_TYPE = "point"
RE_TYPE = "functional"
...
I/O Contract
| Direction | Description |
|---|---|
| Inputs | TSV dataset file with screenshot images and GUI element grounding tasks |
| Outputs | Evaluation results DataFrame with accuracy scores per platform and data type |
Usage Examples
from vlmeval.dataset import build_dataset
dataset = build_dataset('ScreenSpot_Mobile')