Implementation:Arize ai Phoenix Legacy GeminiModel

Overview

GeminiModel is a legacy model wrapper in the phoenix-evals package that provides an interface for using Google Gemini models via the VertexAI SDK. It extends BaseModel and integrates with the vertexai.preview.generative_models module, providing both synchronous and native asynchronous generation, dynamic rate limiting based on GCP ResourceExhausted errors, configurable VertexAI project/location/credentials, and structured token usage tracking including thinking tokens. This wrapper differs from GoogleGenAIModel by using the VertexAI SDK directly rather than the Google GenAI SDK.

LLM_Evaluation Model_Integration

Description

The GeminiModel class is implemented as a Python dataclass that extends the abstract BaseModel. Key characteristics include:

VertexAI SDK integration: Uses vertexai.preview.generative_models.GenerativeModel for model instantiation and content generation.
Project/Location/Credentials configuration: Initializes VertexAI with configurable GCP project, location, and credentials via vertexai.init().
Full async support: Both sync (_model.generate_content()) and async (_model.generate_content_async()) generation methods are natively implemented with rate limiting applied inline.
Dynamic rate limiting: Configures the RateLimiter with google.api_core.exceptions.ResourceExhausted at 5 requests per second with a 1-minute enforcement window.
Reduced default concurrency: Sets default_concurrency=5 (compared to the base class default of 20) because the VertexAI SDK encounters connection pool limits at higher concurrency.
Model token limit mapping: Includes a comprehensive mapping of Gemini model names to their token limits (e.g., "gemini-2.5-pro": 2097152).
Thinking token tracking: Usage extraction includes thoughts_token_count alongside candidates_token_count.
Text-only prompts: Converts multimodal prompts to text-only format via prompt.to_text_only_prompt().
Verbose error handling: When extraction of candidate text fails, logs detailed debug information if verbosity is enabled.
Client reloading: Implements reload_client() for reinitializing the VertexAI client.

Usage

# Set up your GCP environment
# https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal

from phoenix.evals.models import GeminiModel

# Basic usage with defaults (gemini-2.5-flash)
model = GeminiModel()

# Specify project and model
model = GeminiModel(
    model="gemini-2.5-pro",
    project="my-gcp-project",
    location="us-central1",
)

response = model("Explain the concept of attention mechanisms in transformers.")
print(response)

Code Reference

Source Location

Property	Value
Repository	Arize-ai/phoenix
File	`packages/phoenix-evals/src/phoenix/evals/legacy/models/vertex.py`
Lines	250
Module	`phoenix.evals.legacy.models.vertex`

Class Signature

@dataclass
class GeminiModel(BaseModel):
    project: Optional[str] = None
    location: Optional[str] = None
    credentials: Optional["Credentials"] = None
    default_concurrency: int = 5
    model: str = "gemini-2.5-flash"
    temperature: float = 0.0
    max_tokens: int = 1024
    top_p: float = 1
    top_k: int = 32
    stop_sequences: List[str] = field(default_factory=list)
    initial_rate_limit: int = 5
    timeout: int = 120
    model_kwargs: Dict[str, Any] = field(default_factory=dict)

Constructor Parameters

Parameter	Type	Default	Description
project	`Optional[str]`	`None`	GCP project ID.
location	`Optional[str]`	`None`	GCP location/region (defaults to us-central1 if not set).
credentials	`Optional[Credentials]`	`None`	Google auth credentials.
default_concurrency	`int`	`5`	Reduced concurrency to avoid connection pool limits.
model	`str`	`"gemini-2.5-flash"`	The Gemini model name to use.
temperature	`float`	`0.0`	Sampling temperature.
max_tokens	`int`	`1024`	Maximum output tokens.
top_p	`float`	`1`	Nucleus sampling probability mass.
top_k	`int`	`32`	Top-K sampling cutoff.
stop_sequences	`List[str]`	`[]`	Sequences that halt generation.
initial_rate_limit	`int`	`5`	Initial requests-per-second rate limit.
timeout	`int`	`120`	Timeout for API requests in seconds.
model_kwargs	`Dict[str, Any]`	`{}`	Additional keyword arguments passed to the `GenerativeModel` constructor.

Model Token Limit Mapping

Model	Token Limit
`gemini-pro`	32,760
`gemini-pro-vision`	16,384
`gemini-1.5-flash`	1,048,576
`gemini-1.5-pro`	2,097,152
`gemini-2.0-flash`	1,048,576
`gemini-2.0-flash-lite`	1,048,576
`gemini-2.5-flash`	1,048,576
`gemini-2.5-flash-lite`	1,048,576
`gemini-2.5-pro`	2,097,152

Key Methods

Method	Signature	Description
__post_init__	`(self) -> None`	Initializes client, VertexAI, and rate limiter.
reload_client	`(self) -> None`	Reinitializes the VertexAI client and model.
_init_client	`(self) -> None`	Imports VertexAI SDK and creates the `GenerativeModel` instance.
_init_vertex_ai	`(self) -> None`	Calls `vertexai.init()` with project, location, and credentials.
_init_rate_limiter	`(self) -> None`	Configures rate limiter with `ResourceExhausted`.
_generate_with_extra	`(self, prompt, **kwargs) -> Tuple[str, ExtraInfo]`	Synchronous generation with inline rate limiting.
_async_generate_with_extra	`async (self, prompt, **kwargs) -> Tuple[str, ExtraInfo]`	Native async generation via `generate_content_async()`.
generation_config	`@property -> Dict[str, Any]`	Returns the generation configuration dictionary.
_extract_text	`(self, response: Any) -> str`	Extracts candidate text with verbose error logging.
_extract_usage	`(self, response: Any) -> Optional[Usage]`	Extracts token usage including thinking tokens.
_construct_prompt	`(self, prompt: MultimodalPrompt) -> str`	Converts multimodal prompt to text-only string.

Import

from phoenix.evals.models import GeminiModel

I/O Contract

Direction	Type	Description
Input	`Union[str, MultimodalPrompt]`	A text string or multimodal prompt (converted to text-only).
Input (optional)	`Optional[str]`	Instruction parameter (ignored; stripped before API call).
Output	`str`	Generated text response from the Gemini model.
Output (with extra)	`Tuple[str, ExtraInfo]`	Generated text paired with `ExtraInfo` containing optional `Usage` token counts.
Error	`ImportError`	Raised if `vertexai` package is not installed.

Usage Examples

Basic Generation

from phoenix.evals.models import GeminiModel

model = GeminiModel(
    model="gemini-2.5-flash",
    project="my-project",
    temperature=0.0,
)
response = model("What are the benefits of using vector databases?")
print(response)

With Custom Credentials

from google.oauth2 import service_account
from phoenix.evals.models import GeminiModel

credentials = service_account.Credentials.from_service_account_file(
    "path/to/service-account.json"
)

model = GeminiModel(
    model="gemini-2.5-pro",
    project="my-project",
    location="europe-west1",
    credentials=credentials,
)

Async Generation

import asyncio
from phoenix.evals.models import GeminiModel

model = GeminiModel(model="gemini-2.5-flash", project="my-project")

async def generate():
    result = await model._async_generate("Explain RAG architecture.")
    return result

response = asyncio.run(generate())
print(response)

Related Pages

Arize_ai_Phoenix_Legacy_BaseModel - Abstract base class that GeminiModel extends
Arize_ai_Phoenix_Legacy_GoogleGenAIModel - Alternative Gemini wrapper using the Google GenAI SDK (supports multimodal inputs)
Arize_ai_Phoenix_Legacy_VertexAIModel - Legacy VertexAI text/code generation wrapper (deprecated, for non-Gemini models)
Arize_ai_Phoenix_Legacy_AnthropicModel - Anthropic model wrapper (similar async pattern)
Arize_ai_Phoenix_Legacy_OpenAIModel - OpenAI model wrapper

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment