Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Open compass VLMEvalKit GeminiWrapper

From Leeroopedia
Field Value
source VLMEvalKit
domain Vision, API_Integration

Overview

GeminiWrapper provides a VLMEvalKit API adapter for Google Gemini vision-language models.

Description

GeminiWrapper inherits from BaseAPI and supports two backends: the Google GenAI client library and Google Cloud Vertex AI. It handles video upload via the GenAI file API, supports configurable thinking budgets for reasoning tasks, and provides media resolution control (low/medium/high) for both images and video. Authentication uses the GOOGLE_API_KEY environment variable.

Usage

Use this adapter when evaluating Google Gemini models (such as gemini-1.0-pro or gemini-1.5-pro) through the GenAI or Vertex AI backends.

Code Reference

  • Source: vlmeval/api/gemini.py, Lines: L1-186
  • Import: from vlmeval.api.gemini import GeminiWrapper

Signature:

class GeminiWrapper(BaseAPI):
    def __init__(self, model='gemini-1.0-pro', retry=5, key=None, verbose=True,
                 temperature=0.0, system_prompt=None, max_tokens=2048,
                 proxy=None, backend='genai', project_id='vlmeval',
                 thinking_budget=None, fps=1, media_resolution=None,
                 **kwargs): ...
    def generate_inner(self, inputs, **kwargs): ...

I/O Contract

Direction Description
Inputs message — text/image/video content list; model-specific params via kwargs
Outputs generate() returns str prediction; generate_inner() returns (int, str, str) tuple

Usage Examples

# Example instantiation
model = GeminiWrapper(model='gemini-1.0-pro')
response = model.generate(message)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment