Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Arize ai Phoenix Legacy GeminiModel

From Leeroopedia

Overview

GeminiModel is a legacy model wrapper in the phoenix-evals package that provides an interface for using Google Gemini models via the VertexAI SDK. It extends BaseModel and integrates with the vertexai.preview.generative_models module, providing both synchronous and native asynchronous generation, dynamic rate limiting based on GCP ResourceExhausted errors, configurable VertexAI project/location/credentials, and structured token usage tracking including thinking tokens. This wrapper differs from GoogleGenAIModel by using the VertexAI SDK directly rather than the Google GenAI SDK.

LLM_Evaluation Model_Integration

Description

The GeminiModel class is implemented as a Python dataclass that extends the abstract BaseModel. Key characteristics include:

  • VertexAI SDK integration: Uses vertexai.preview.generative_models.GenerativeModel for model instantiation and content generation.
  • Project/Location/Credentials configuration: Initializes VertexAI with configurable GCP project, location, and credentials via vertexai.init().
  • Full async support: Both sync (_model.generate_content()) and async (_model.generate_content_async()) generation methods are natively implemented with rate limiting applied inline.
  • Dynamic rate limiting: Configures the RateLimiter with google.api_core.exceptions.ResourceExhausted at 5 requests per second with a 1-minute enforcement window.
  • Reduced default concurrency: Sets default_concurrency=5 (compared to the base class default of 20) because the VertexAI SDK encounters connection pool limits at higher concurrency.
  • Model token limit mapping: Includes a comprehensive mapping of Gemini model names to their token limits (e.g., "gemini-2.5-pro": 2097152).
  • Thinking token tracking: Usage extraction includes thoughts_token_count alongside candidates_token_count.
  • Text-only prompts: Converts multimodal prompts to text-only format via prompt.to_text_only_prompt().
  • Verbose error handling: When extraction of candidate text fails, logs detailed debug information if verbosity is enabled.
  • Client reloading: Implements reload_client() for reinitializing the VertexAI client.

Usage

# Set up your GCP environment
# https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal

from phoenix.evals.models import GeminiModel

# Basic usage with defaults (gemini-2.5-flash)
model = GeminiModel()

# Specify project and model
model = GeminiModel(
    model="gemini-2.5-pro",
    project="my-gcp-project",
    location="us-central1",
)

response = model("Explain the concept of attention mechanisms in transformers.")
print(response)

Code Reference

Source Location

Property Value
Repository Arize-ai/phoenix
File packages/phoenix-evals/src/phoenix/evals/legacy/models/vertex.py
Lines 250
Module phoenix.evals.legacy.models.vertex

Class Signature

@dataclass
class GeminiModel(BaseModel):
    project: Optional[str] = None
    location: Optional[str] = None
    credentials: Optional["Credentials"] = None
    default_concurrency: int = 5
    model: str = "gemini-2.5-flash"
    temperature: float = 0.0
    max_tokens: int = 1024
    top_p: float = 1
    top_k: int = 32
    stop_sequences: List[str] = field(default_factory=list)
    initial_rate_limit: int = 5
    timeout: int = 120
    model_kwargs: Dict[str, Any] = field(default_factory=dict)

Constructor Parameters

Parameter Type Default Description
project Optional[str] None GCP project ID.
location Optional[str] None GCP location/region (defaults to us-central1 if not set).
credentials Optional[Credentials] None Google auth credentials.
default_concurrency int 5 Reduced concurrency to avoid connection pool limits.
model str "gemini-2.5-flash" The Gemini model name to use.
temperature float 0.0 Sampling temperature.
max_tokens int 1024 Maximum output tokens.
top_p float 1 Nucleus sampling probability mass.
top_k int 32 Top-K sampling cutoff.
stop_sequences List[str] [] Sequences that halt generation.
initial_rate_limit int 5 Initial requests-per-second rate limit.
timeout int 120 Timeout for API requests in seconds.
model_kwargs Dict[str, Any] {} Additional keyword arguments passed to the GenerativeModel constructor.

Model Token Limit Mapping

Model Token Limit
gemini-pro 32,760
gemini-pro-vision 16,384
gemini-1.5-flash 1,048,576
gemini-1.5-pro 2,097,152
gemini-2.0-flash 1,048,576
gemini-2.0-flash-lite 1,048,576
gemini-2.5-flash 1,048,576
gemini-2.5-flash-lite 1,048,576
gemini-2.5-pro 2,097,152

Key Methods

Method Signature Description
__post_init__ (self) -> None Initializes client, VertexAI, and rate limiter.
reload_client (self) -> None Reinitializes the VertexAI client and model.
_init_client (self) -> None Imports VertexAI SDK and creates the GenerativeModel instance.
_init_vertex_ai (self) -> None Calls vertexai.init() with project, location, and credentials.
_init_rate_limiter (self) -> None Configures rate limiter with ResourceExhausted.
_generate_with_extra (self, prompt, **kwargs) -> Tuple[str, ExtraInfo] Synchronous generation with inline rate limiting.
_async_generate_with_extra async (self, prompt, **kwargs) -> Tuple[str, ExtraInfo] Native async generation via generate_content_async().
generation_config @property -> Dict[str, Any] Returns the generation configuration dictionary.
_extract_text (self, response: Any) -> str Extracts candidate text with verbose error logging.
_extract_usage (self, response: Any) -> Optional[Usage] Extracts token usage including thinking tokens.
_construct_prompt (self, prompt: MultimodalPrompt) -> str Converts multimodal prompt to text-only string.

Import

from phoenix.evals.models import GeminiModel

I/O Contract

Direction Type Description
Input Union[str, MultimodalPrompt] A text string or multimodal prompt (converted to text-only).
Input (optional) Optional[str] Instruction parameter (ignored; stripped before API call).
Output str Generated text response from the Gemini model.
Output (with extra) Tuple[str, ExtraInfo] Generated text paired with ExtraInfo containing optional Usage token counts.
Error ImportError Raised if vertexai package is not installed.

Usage Examples

Basic Generation

from phoenix.evals.models import GeminiModel

model = GeminiModel(
    model="gemini-2.5-flash",
    project="my-project",
    temperature=0.0,
)
response = model("What are the benefits of using vector databases?")
print(response)

With Custom Credentials

from google.oauth2 import service_account
from phoenix.evals.models import GeminiModel

credentials = service_account.Credentials.from_service_account_file(
    "path/to/service-account.json"
)

model = GeminiModel(
    model="gemini-2.5-pro",
    project="my-project",
    location="europe-west1",
    credentials=credentials,
)

Async Generation

import asyncio
from phoenix.evals.models import GeminiModel

model = GeminiModel(model="gemini-2.5-flash", project="my-project")

async def generate():
    result = await model._async_generate("Explain RAG architecture.")
    return result

response = asyncio.run(generate())
print(response)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment