Implementation:Open compass VLMEvalKit GeminiWrapper

Field	Value
source	VLMEvalKit
domain	Vision, API_Integration

Overview

GeminiWrapper provides a VLMEvalKit API adapter for Google Gemini vision-language models.

Description

GeminiWrapper inherits from BaseAPI and supports two backends: the Google GenAI client library and Google Cloud Vertex AI. It handles video upload via the GenAI file API, supports configurable thinking budgets for reasoning tasks, and provides media resolution control (low/medium/high) for both images and video. Authentication uses the GOOGLE_API_KEY environment variable.

Usage

Use this adapter when evaluating Google Gemini models (such as gemini-1.0-pro or gemini-1.5-pro) through the GenAI or Vertex AI backends.

Code Reference

Source: vlmeval/api/gemini.py, Lines: L1-186
Import: from vlmeval.api.gemini import GeminiWrapper

Signature:

class GeminiWrapper(BaseAPI):
    def __init__(self, model='gemini-1.0-pro', retry=5, key=None, verbose=True,
                 temperature=0.0, system_prompt=None, max_tokens=2048,
                 proxy=None, backend='genai', project_id='vlmeval',
                 thinking_budget=None, fps=1, media_resolution=None,
                 **kwargs): ...
    def generate_inner(self, inputs, **kwargs): ...

I/O Contract

Direction	Description
Inputs	message — text/image/video content list; model-specific params via kwargs
Outputs	generate() returns str prediction; generate_inner() returns (int, str, str) tuple

Usage Examples

# Example instantiation
model = GeminiWrapper(model='gemini-1.0-pro')
response = model.generate(message)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment