Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Open compass VLMEvalKit Image Encoding For TSV

From Leeroopedia
Field Value
source Repo
domain Vision, Data_Processing

Overview

An encoding scheme that stores images as base64 strings within TSV files for self-contained benchmark distribution.

Description

VLMEvalKit stores benchmark data in TSV (tab-separated values) files where images are encoded as base64 strings in the image column. This makes datasets self-contained — a single TSV file contains all questions, answer options, ground truth, and images.

The encode_image_to_base64() function handles encoding with configurable resizing:

  • target_size — for thumbnailing
  • max_size enforcement — via progressive downscaling
  • min_edge enforcement — via upscaling

The decode_base64_to_image() function reverses the process. This format enables easy distribution via HTTP URLs with MD5 integrity verification.

Usage

Use when preparing a new benchmark TSV file for VLMEvalKit. Encode all images to base64 and include them in the image column of the TSV.

Theoretical Basis

Base64 encoding converts binary image data to ASCII text, enabling storage in text-based formats like TSV. The trade-off is ~33% size increase but complete self-containment.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment