Implementation:Openai Evals GeminiSolver

Knowledge Sources	Openai_Evals
Domains	Evaluation, LLM Provider Integration
Last Updated	2026-02-14 10:00 GMT

Overview

Concrete solver for running evaluation tasks against Google's Gemini API provided by the evals library.

Description

GeminiSolver is a Solver subclass that generates responses through Google's Generative AI (Gemini) API. It converts the evals message format into the Google-specific structure, handles safety-filter blocks and API errors gracefully, and supports thread-safe operation via a shared client.

The module also defines the GoogleMessage dataclass, which serves as an intermediate representation between the evals Message type and the dictionary format expected by the Gemini SDK.

Key behaviours:

GoogleMessage dataclass -- A lightweight @dataclass with role (either "user" or "model") and parts (a list of strings). Provides to_dict() for serialisation and the static factory from_evals_message() which maps evals roles to Google roles (system and user become "user"; assistant becomes "model").
Role mapping and message merging -- The Gemini API requires strictly alternating user/model turns and the final message must come from user. The static method _convert_msgs_to_google_format enforces both constraints by merging consecutive same-role messages (joining parts with newlines) and asserting that the last message has the user role.
Safety settings -- All four harm categories (HARASSMENT, HATE_SPEECH, SEXUALLY_EXPLICIT, DANGEROUS_CONTENT) are set to BLOCK_NONE so that evaluation prompts are not silently filtered. When the API does block a response, the solver captures the prompt_feedback as both the output string and the error field of the SolverResult.
Error handling -- GoogleAPIError and specific ValueError exceptions (known quick-accessor failures) are caught and returned as SolverResult objects with the error message as output, preventing a single failed sample from crashing an entire eval run.
Thread safety -- The underlying glm_client is created once during __init__ (via get_default_generative_client()) and manually assigned to each GenerativeModel instance before generation. A custom __deepcopy__ ensures that when the solver is copied across threads, all copies share the same client rather than each creating a new one.
Retry logic -- Transient API failures (RetryError, TooManyRequests, ResourceExhausted) are retried with exponential back-off via create_retrying.

Usage

Import GeminiSolver to benchmark or evaluate prompts against a Google Gemini model. It is typically specified by class path in a YAML eval configuration. The GEMINI_API_KEY environment variable must be set and the google-generativeai package must be installed.

Code Reference

Source Location

Repository: Openai_Evals
File: evals/solvers/providers/google/gemini_solver.py
Lines: 1-211

Signature

@dataclass
class GoogleMessage:
    role: str
    parts: list[str]
    def to_dict(self) -> dict:
    @staticmethod
    def from_evals_message(msg: Message) -> "GoogleMessage":

class GeminiSolver(Solver):
    def __init__(
        self,
        model_name: str,
        generation_config: Dict[str, Any] = {},
        postprocessors: list[str] = [],
        registry: Any = None,
    ):
    def _solve(self, task_state: TaskState, **kwargs) -> SolverResult:
    @staticmethod
    def _convert_msgs_to_google_format(msgs: list[Message]) -> list[GoogleMessage]:
    @property
    def name(self) -> str:
    @property
    def model_version(self) -> Union[str, dict]:
    def __deepcopy__(self, memo) -> "GeminiSolver":

Import

from evals.solvers.providers.google.gemini_solver import GeminiSolver

I/O Contract

Inputs

Name	Type	Required	Description
model_name	`str`	Yes	Google Gemini model identifier (e.g. `"gemini-pro"`, `"gemini-1.5-pro-latest"`).
generation_config	`Dict[str, Any]`	No (default `{}`)	Keyword arguments forwarded to `genai.GenerationConfig` (e.g. `temperature`, `max_output_tokens`, `top_p`, `top_k`).
postprocessors	`list[str]`	No (default `[]`)	Fully-qualified class paths of PostProcessor instances to apply to the solver output.
registry	`Any`	No (default `None`)	Unused; accepted for interface compatibility with the solver registry.
task_state	`TaskState`	Yes (at solve time)	The evaluation task state containing `task_description` and `messages`. The `task_description` is prepended as an initial `user` message.

Outputs

Name	Type	Description
result	`SolverResult`	Contains the model's text response in `output`. If the request was blocked by safety filters or hit an API error, `output` holds the error description and the `error` field holds the exception or feedback object.

Usage Examples

from evals.solvers.providers.google.gemini_solver import GeminiSolver, GoogleMessage
from evals.task_state import TaskState, Message

# Instantiate the solver
solver = GeminiSolver(
    model_name="gemini-pro",
    generation_config={"temperature": 0.5, "max_output_tokens": 256},
)

# Build a task state
task_state = TaskState(
    task_description="You are a geography expert.",
    messages=[
        Message(role="user", content="Name the three largest countries by area."),
    ],
)

# Solve the task
result = solver(task_state)
print(result.output)
# e.g. "The three largest countries by area are Russia, Canada, and the United States."

# Using GoogleMessage directly for format inspection
gmsg = GoogleMessage.from_evals_message(Message(role="assistant", content="Hello"))
print(gmsg.role)   # "model"
print(gmsg.parts)  # ["Hello"]
print(gmsg.to_dict())  # {"role": "model", "parts": ["Hello"]}

# Thread-safe copying (glm_client is shared)
import copy
solver_copy = copy.deepcopy(solver)
assert solver_copy.glm_client is solver.glm_client  # same client object

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment