Implementation:Openai Evals GeminiSolver
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, LLM Provider Integration |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
Concrete solver for running evaluation tasks against Google's Gemini API provided by the evals library.
Description
GeminiSolver is a Solver subclass that generates responses through Google's Generative AI (Gemini) API. It converts the evals message format into the Google-specific structure, handles safety-filter blocks and API errors gracefully, and supports thread-safe operation via a shared client.
The module also defines the GoogleMessage dataclass, which serves as an intermediate representation between the evals Message type and the dictionary format expected by the Gemini SDK.
Key behaviours:
- GoogleMessage dataclass -- A lightweight
@dataclasswithrole(either"user"or"model") andparts(a list of strings). Providesto_dict()for serialisation and the static factoryfrom_evals_message()which maps evals roles to Google roles (systemanduserbecome"user";assistantbecomes"model"). - Role mapping and message merging -- The Gemini API requires strictly alternating
user/modelturns and the final message must come fromuser. The static method_convert_msgs_to_google_formatenforces both constraints by merging consecutive same-role messages (joining parts with newlines) and asserting that the last message has theuserrole. - Safety settings -- All four harm categories (
HARASSMENT,HATE_SPEECH,SEXUALLY_EXPLICIT,DANGEROUS_CONTENT) are set toBLOCK_NONEso that evaluation prompts are not silently filtered. When the API does block a response, the solver captures theprompt_feedbackas both the output string and theerrorfield of the SolverResult. - Error handling --
GoogleAPIErrorand specificValueErrorexceptions (known quick-accessor failures) are caught and returned as SolverResult objects with the error message as output, preventing a single failed sample from crashing an entire eval run. - Thread safety -- The underlying
glm_clientis created once during__init__(viaget_default_generative_client()) and manually assigned to eachGenerativeModelinstance before generation. A custom__deepcopy__ensures that when the solver is copied across threads, all copies share the same client rather than each creating a new one. - Retry logic -- Transient API failures (
RetryError,TooManyRequests,ResourceExhausted) are retried with exponential back-off viacreate_retrying.
Usage
Import GeminiSolver to benchmark or evaluate prompts against a Google Gemini model. It is typically specified by class path in a YAML eval configuration. The GEMINI_API_KEY environment variable must be set and the google-generativeai package must be installed.
Code Reference
Source Location
- Repository: Openai_Evals
- File: evals/solvers/providers/google/gemini_solver.py
- Lines: 1-211
Signature
@dataclass
class GoogleMessage:
role: str
parts: list[str]
def to_dict(self) -> dict:
@staticmethod
def from_evals_message(msg: Message) -> "GoogleMessage":
class GeminiSolver(Solver):
def __init__(
self,
model_name: str,
generation_config: Dict[str, Any] = {},
postprocessors: list[str] = [],
registry: Any = None,
):
def _solve(self, task_state: TaskState, **kwargs) -> SolverResult:
@staticmethod
def _convert_msgs_to_google_format(msgs: list[Message]) -> list[GoogleMessage]:
@property
def name(self) -> str:
@property
def model_version(self) -> Union[str, dict]:
def __deepcopy__(self, memo) -> "GeminiSolver":
Import
from evals.solvers.providers.google.gemini_solver import GeminiSolver
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_name | str |
Yes | Google Gemini model identifier (e.g. "gemini-pro", "gemini-1.5-pro-latest").
|
| generation_config | Dict[str, Any] |
No (default {}) |
Keyword arguments forwarded to genai.GenerationConfig (e.g. temperature, max_output_tokens, top_p, top_k).
|
| postprocessors | list[str] |
No (default []) |
Fully-qualified class paths of PostProcessor instances to apply to the solver output. |
| registry | Any |
No (default None) |
Unused; accepted for interface compatibility with the solver registry. |
| task_state | TaskState |
Yes (at solve time) | The evaluation task state containing task_description and messages. The task_description is prepended as an initial user message.
|
Outputs
| Name | Type | Description |
|---|---|---|
| result | SolverResult |
Contains the model's text response in output. If the request was blocked by safety filters or hit an API error, output holds the error description and the error field holds the exception or feedback object.
|
Usage Examples
from evals.solvers.providers.google.gemini_solver import GeminiSolver, GoogleMessage
from evals.task_state import TaskState, Message
# Instantiate the solver
solver = GeminiSolver(
model_name="gemini-pro",
generation_config={"temperature": 0.5, "max_output_tokens": 256},
)
# Build a task state
task_state = TaskState(
task_description="You are a geography expert.",
messages=[
Message(role="user", content="Name the three largest countries by area."),
],
)
# Solve the task
result = solver(task_state)
print(result.output)
# e.g. "The three largest countries by area are Russia, Canada, and the United States."
# Using GoogleMessage directly for format inspection
gmsg = GoogleMessage.from_evals_message(Message(role="assistant", content="Hello"))
print(gmsg.role) # "model"
print(gmsg.parts) # ["Hello"]
print(gmsg.to_dict()) # {"role": "model", "parts": ["Hello"]}
# Thread-safe copying (glm_client is shared)
import copy
solver_copy = copy.deepcopy(solver)
assert solver_copy.glm_client is solver.glm_client # same client object