Overview
GeminiModel is a legacy model wrapper in the phoenix-evals package that provides an interface for using Google Gemini models via the VertexAI SDK. It extends BaseModel and integrates with the vertexai.preview.generative_models module, providing both synchronous and native asynchronous generation, dynamic rate limiting based on GCP ResourceExhausted errors, configurable VertexAI project/location/credentials, and structured token usage tracking including thinking tokens. This wrapper differs from GoogleGenAIModel by using the VertexAI SDK directly rather than the Google GenAI SDK.
LLM_Evaluation
Model_Integration
Description
The GeminiModel class is implemented as a Python dataclass that extends the abstract BaseModel. Key characteristics include:
- VertexAI SDK integration: Uses
vertexai.preview.generative_models.GenerativeModel for model instantiation and content generation.
- Project/Location/Credentials configuration: Initializes VertexAI with configurable GCP project, location, and credentials via
vertexai.init().
- Full async support: Both sync (
_model.generate_content()) and async (_model.generate_content_async()) generation methods are natively implemented with rate limiting applied inline.
- Dynamic rate limiting: Configures the
RateLimiter with google.api_core.exceptions.ResourceExhausted at 5 requests per second with a 1-minute enforcement window.
- Reduced default concurrency: Sets
default_concurrency=5 (compared to the base class default of 20) because the VertexAI SDK encounters connection pool limits at higher concurrency.
- Model token limit mapping: Includes a comprehensive mapping of Gemini model names to their token limits (e.g.,
"gemini-2.5-pro": 2097152).
- Thinking token tracking: Usage extraction includes
thoughts_token_count alongside candidates_token_count.
- Text-only prompts: Converts multimodal prompts to text-only format via
prompt.to_text_only_prompt().
- Verbose error handling: When extraction of candidate text fails, logs detailed debug information if verbosity is enabled.
- Client reloading: Implements
reload_client() for reinitializing the VertexAI client.
Usage
# Set up your GCP environment
# https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal
from phoenix.evals.models import GeminiModel
# Basic usage with defaults (gemini-2.5-flash)
model = GeminiModel()
# Specify project and model
model = GeminiModel(
model="gemini-2.5-pro",
project="my-gcp-project",
location="us-central1",
)
response = model("Explain the concept of attention mechanisms in transformers.")
print(response)
Code Reference
Source Location
| Property |
Value
|
| Repository |
Arize-ai/phoenix
|
| File |
packages/phoenix-evals/src/phoenix/evals/legacy/models/vertex.py
|
| Lines |
250
|
| Module |
phoenix.evals.legacy.models.vertex
|
Class Signature
@dataclass
class GeminiModel(BaseModel):
project: Optional[str] = None
location: Optional[str] = None
credentials: Optional["Credentials"] = None
default_concurrency: int = 5
model: str = "gemini-2.5-flash"
temperature: float = 0.0
max_tokens: int = 1024
top_p: float = 1
top_k: int = 32
stop_sequences: List[str] = field(default_factory=list)
initial_rate_limit: int = 5
timeout: int = 120
model_kwargs: Dict[str, Any] = field(default_factory=dict)
Constructor Parameters
| Parameter |
Type |
Default |
Description
|
| project |
Optional[str] |
None |
GCP project ID.
|
| location |
Optional[str] |
None |
GCP location/region (defaults to us-central1 if not set).
|
| credentials |
Optional[Credentials] |
None |
Google auth credentials.
|
| default_concurrency |
int |
5 |
Reduced concurrency to avoid connection pool limits.
|
| model |
str |
"gemini-2.5-flash" |
The Gemini model name to use.
|
| temperature |
float |
0.0 |
Sampling temperature.
|
| max_tokens |
int |
1024 |
Maximum output tokens.
|
| top_p |
float |
1 |
Nucleus sampling probability mass.
|
| top_k |
int |
32 |
Top-K sampling cutoff.
|
| stop_sequences |
List[str] |
[] |
Sequences that halt generation.
|
| initial_rate_limit |
int |
5 |
Initial requests-per-second rate limit.
|
| timeout |
int |
120 |
Timeout for API requests in seconds.
|
| model_kwargs |
Dict[str, Any] |
{} |
Additional keyword arguments passed to the GenerativeModel constructor.
|
Model Token Limit Mapping
| Model |
Token Limit
|
gemini-pro |
32,760
|
gemini-pro-vision |
16,384
|
gemini-1.5-flash |
1,048,576
|
gemini-1.5-pro |
2,097,152
|
gemini-2.0-flash |
1,048,576
|
gemini-2.0-flash-lite |
1,048,576
|
gemini-2.5-flash |
1,048,576
|
gemini-2.5-flash-lite |
1,048,576
|
gemini-2.5-pro |
2,097,152
|
Key Methods
| Method |
Signature |
Description
|
| __post_init__ |
(self) -> None |
Initializes client, VertexAI, and rate limiter.
|
| reload_client |
(self) -> None |
Reinitializes the VertexAI client and model.
|
| _init_client |
(self) -> None |
Imports VertexAI SDK and creates the GenerativeModel instance.
|
| _init_vertex_ai |
(self) -> None |
Calls vertexai.init() with project, location, and credentials.
|
| _init_rate_limiter |
(self) -> None |
Configures rate limiter with ResourceExhausted.
|
| _generate_with_extra |
(self, prompt, **kwargs) -> Tuple[str, ExtraInfo] |
Synchronous generation with inline rate limiting.
|
| _async_generate_with_extra |
async (self, prompt, **kwargs) -> Tuple[str, ExtraInfo] |
Native async generation via generate_content_async().
|
| generation_config |
@property -> Dict[str, Any] |
Returns the generation configuration dictionary.
|
| _extract_text |
(self, response: Any) -> str |
Extracts candidate text with verbose error logging.
|
| _extract_usage |
(self, response: Any) -> Optional[Usage] |
Extracts token usage including thinking tokens.
|
| _construct_prompt |
(self, prompt: MultimodalPrompt) -> str |
Converts multimodal prompt to text-only string.
|
Import
from phoenix.evals.models import GeminiModel
I/O Contract
| Direction |
Type |
Description
|
| Input |
Union[str, MultimodalPrompt] |
A text string or multimodal prompt (converted to text-only).
|
| Input (optional) |
Optional[str] |
Instruction parameter (ignored; stripped before API call).
|
| Output |
str |
Generated text response from the Gemini model.
|
| Output (with extra) |
Tuple[str, ExtraInfo] |
Generated text paired with ExtraInfo containing optional Usage token counts.
|
| Error |
ImportError |
Raised if vertexai package is not installed.
|
Usage Examples
Basic Generation
from phoenix.evals.models import GeminiModel
model = GeminiModel(
model="gemini-2.5-flash",
project="my-project",
temperature=0.0,
)
response = model("What are the benefits of using vector databases?")
print(response)
With Custom Credentials
from google.oauth2 import service_account
from phoenix.evals.models import GeminiModel
credentials = service_account.Credentials.from_service_account_file(
"path/to/service-account.json"
)
model = GeminiModel(
model="gemini-2.5-pro",
project="my-project",
location="europe-west1",
credentials=credentials,
)
Async Generation
import asyncio
from phoenix.evals.models import GeminiModel
model = GeminiModel(model="gemini-2.5-flash", project="my-project")
async def generate():
result = await model._async_generate("Explain RAG architecture.")
return result
response = asyncio.run(generate())
print(response)
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.