Implementation:Openai Evals AnthropicSolver

Knowledge Sources	Openai_Evals
Domains	Evaluation, LLM Provider Integration
Last Updated	2026-02-14 10:00 GMT

Overview

Concrete solver for running evaluation tasks against the Anthropic (Claude) API provided by the evals library.

Description

AnthropicSolver is a Solver subclass that delegates text generation to the Anthropic Messages API. It accepts an evals TaskState, converts the contained message history into the format required by the Anthropic SDK, and returns a SolverResult with the model's response.

Key behaviours:

Role mapping -- OpenAI-style roles (system, user, assistant) are translated to Anthropic's two-role scheme (user / assistant) via the module-level oai_to_anthropic_role dictionary. The system role is mapped to user.
Message merging -- Anthropic requires strictly alternating user/assistant turns. Consecutive messages that share the same role after mapping are merged into a single turn with their content blocks concatenated.
System prompt -- The task_description from TaskState is passed as the top-level system parameter to messages.create, separate from the message list.
Retry logic -- The module-level helper anthropic_create_retrying wraps the SDK's client.messages.create with exponential back-off (via evals.utils.api_utils.create_retrying) for transient errors such as RateLimitError, APIConnectionError, APITimeoutError, and InternalServerError.
Usage conversion -- anth_to_openai_usage converts an Anthropic Usage object into a dictionary with OpenAI-compatible keys (prompt_tokens, completion_tokens, total_tokens) so that the evals logging infrastructure can record token counts uniformly.

Usage

Import AnthropicSolver when you need to evaluate a prompt or benchmark against an Anthropic Claude model. The solver is typically referenced by class path in a YAML eval spec. It requires the ANTHROPIC_API_KEY environment variable to be set and the anthropic Python package to be installed.

Code Reference

Source Location

Repository: Openai_Evals
File: evals/solvers/providers/anthropic/anthropic_solver.py
Lines: 1-142

Signature

class AnthropicSolver(Solver):
    def __init__(
        self,
        model_name: str,
        max_tokens: int = 512,
        postprocessors: list[str] = [],
        extra_options: Optional[dict] = {},
        registry: Any = None,
    ):
    def _solve(self, task_state: TaskState, **kwargs) -> SolverResult:
    @property
    def name(self) -> str:
    @property
    def model_version(self) -> Union[str, dict]:
    @staticmethod
    def _convert_msgs_to_anthropic_format(msgs: list[Message]) -> list[MessageParam]:

def anthropic_create_retrying(client: Anthropic, *args, **kwargs):
def anth_to_openai_usage(anth_usage: Usage) -> dict:

Import

from evals.solvers.providers.anthropic.anthropic_solver import AnthropicSolver

I/O Contract

Inputs

Name	Type	Required	Description
model_name	`str`	Yes	Anthropic model identifier (e.g. `"claude-3-opus-20240229"`).
max_tokens	`int`	No (default 512)	Maximum number of tokens the model may generate in its response.
postprocessors	`list[str]`	No (default `[]`)	Fully-qualified class paths of PostProcessor instances to apply to the solver output.
extra_options	`Optional[dict]`	No (default `{}`)	Additional keyword arguments forwarded to `client.messages.create` (e.g. `temperature`, `top_p`).
registry	`Any`	No (default `None`)	Unused; accepted for interface compatibility with the solver registry.
task_state	`TaskState`	Yes (at solve time)	The evaluation task state containing `task_description` (system prompt) and `messages` (conversation history).

Outputs

Name	Type	Description
result	`SolverResult`	Contains the model's text response in `output` and the raw Anthropic `ContentBlock` list in `raw_completion_result`.

Usage Examples

from evals.solvers.providers.anthropic.anthropic_solver import AnthropicSolver
from evals.task_state import TaskState, Message

# Instantiate the solver with a Claude model
solver = AnthropicSolver(
    model_name="claude-3-opus-20240229",
    max_tokens=1024,
    extra_options={"temperature": 0.7},
)

# Build a task state
task_state = TaskState(
    task_description="You are a helpful assistant.",
    messages=[
        Message(role="user", content="What is the capital of France?"),
    ],
)

# Solve the task
result = solver(task_state)
print(result.output)  # e.g. "The capital of France is Paris."

# Access solver metadata
print(solver.name)           # "claude-3-opus-20240229"
print(solver.model_version)  # "claude-3-opus-20240229"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment