Implementation:Open compass VLMEvalKit GeminiWrapper
| Field | Value |
|---|---|
| source | VLMEvalKit |
| domain | Vision, API_Integration |
Overview
GeminiWrapper provides a VLMEvalKit API adapter for Google Gemini vision-language models.
Description
GeminiWrapper inherits from BaseAPI and supports two backends: the Google GenAI client library and Google Cloud Vertex AI. It handles video upload via the GenAI file API, supports configurable thinking budgets for reasoning tasks, and provides media resolution control (low/medium/high) for both images and video. Authentication uses the GOOGLE_API_KEY environment variable.
Usage
Use this adapter when evaluating Google Gemini models (such as gemini-1.0-pro or gemini-1.5-pro) through the GenAI or Vertex AI backends.
Code Reference
- Source:
vlmeval/api/gemini.py, Lines: L1-186 - Import:
from vlmeval.api.gemini import GeminiWrapper
Signature:
class GeminiWrapper(BaseAPI):
def __init__(self, model='gemini-1.0-pro', retry=5, key=None, verbose=True,
temperature=0.0, system_prompt=None, max_tokens=2048,
proxy=None, backend='genai', project_id='vlmeval',
thinking_budget=None, fps=1, media_resolution=None,
**kwargs): ...
def generate_inner(self, inputs, **kwargs): ...
I/O Contract
| Direction | Description |
|---|---|
| Inputs | message — text/image/video content list; model-specific params via kwargs |
| Outputs | generate() returns str prediction; generate_inner() returns (int, str, str) tuple |
Usage Examples
# Example instantiation
model = GeminiWrapper(model='gemini-1.0-pro')
response = model.generate(message)