Principle:Open compass VLMEvalKit Image Encoding For TSV
| Field | Value |
|---|---|
| source | Repo |
| domain | Vision, Data_Processing |
Overview
An encoding scheme that stores images as base64 strings within TSV files for self-contained benchmark distribution.
Description
VLMEvalKit stores benchmark data in TSV (tab-separated values) files where images are encoded as base64 strings in the image column. This makes datasets self-contained — a single TSV file contains all questions, answer options, ground truth, and images.
The encode_image_to_base64() function handles encoding with configurable resizing:
- target_size — for thumbnailing
- max_size enforcement — via progressive downscaling
- min_edge enforcement — via upscaling
The decode_base64_to_image() function reverses the process. This format enables easy distribution via HTTP URLs with MD5 integrity verification.
Usage
Use when preparing a new benchmark TSV file for VLMEvalKit. Encode all images to base64 and include them in the image column of the TSV.
Theoretical Basis
Base64 encoding converts binary image data to ASCII text, enabling storage in text-based formats like TSV. The trade-off is ~33% size increase but complete self-containment.