Implementation:Openai Evals AnthropicSolver
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, LLM Provider Integration |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
Concrete solver for running evaluation tasks against the Anthropic (Claude) API provided by the evals library.
Description
AnthropicSolver is a Solver subclass that delegates text generation to the Anthropic Messages API. It accepts an evals TaskState, converts the contained message history into the format required by the Anthropic SDK, and returns a SolverResult with the model's response.
Key behaviours:
- Role mapping -- OpenAI-style roles (
system,user,assistant) are translated to Anthropic's two-role scheme (user/assistant) via the module-leveloai_to_anthropic_roledictionary. Thesystemrole is mapped touser. - Message merging -- Anthropic requires strictly alternating
user/assistantturns. Consecutive messages that share the same role after mapping are merged into a single turn with their content blocks concatenated. - System prompt -- The
task_descriptionfrom TaskState is passed as the top-levelsystemparameter tomessages.create, separate from the message list. - Retry logic -- The module-level helper anthropic_create_retrying wraps the SDK's
client.messages.createwith exponential back-off (viaevals.utils.api_utils.create_retrying) for transient errors such asRateLimitError,APIConnectionError,APITimeoutError, andInternalServerError. - Usage conversion -- anth_to_openai_usage converts an Anthropic
Usageobject into a dictionary with OpenAI-compatible keys (prompt_tokens,completion_tokens,total_tokens) so that the evals logging infrastructure can record token counts uniformly.
Usage
Import AnthropicSolver when you need to evaluate a prompt or benchmark against an Anthropic Claude model. The solver is typically referenced by class path in a YAML eval spec. It requires the ANTHROPIC_API_KEY environment variable to be set and the anthropic Python package to be installed.
Code Reference
Source Location
- Repository: Openai_Evals
- File: evals/solvers/providers/anthropic/anthropic_solver.py
- Lines: 1-142
Signature
class AnthropicSolver(Solver):
def __init__(
self,
model_name: str,
max_tokens: int = 512,
postprocessors: list[str] = [],
extra_options: Optional[dict] = {},
registry: Any = None,
):
def _solve(self, task_state: TaskState, **kwargs) -> SolverResult:
@property
def name(self) -> str:
@property
def model_version(self) -> Union[str, dict]:
@staticmethod
def _convert_msgs_to_anthropic_format(msgs: list[Message]) -> list[MessageParam]:
def anthropic_create_retrying(client: Anthropic, *args, **kwargs):
def anth_to_openai_usage(anth_usage: Usage) -> dict:
Import
from evals.solvers.providers.anthropic.anthropic_solver import AnthropicSolver
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_name | str |
Yes | Anthropic model identifier (e.g. "claude-3-opus-20240229").
|
| max_tokens | int |
No (default 512) | Maximum number of tokens the model may generate in its response. |
| postprocessors | list[str] |
No (default []) |
Fully-qualified class paths of PostProcessor instances to apply to the solver output. |
| extra_options | Optional[dict] |
No (default {}) |
Additional keyword arguments forwarded to client.messages.create (e.g. temperature, top_p).
|
| registry | Any |
No (default None) |
Unused; accepted for interface compatibility with the solver registry. |
| task_state | TaskState |
Yes (at solve time) | The evaluation task state containing task_description (system prompt) and messages (conversation history).
|
Outputs
| Name | Type | Description |
|---|---|---|
| result | SolverResult |
Contains the model's text response in output and the raw Anthropic ContentBlock list in raw_completion_result.
|
Usage Examples
from evals.solvers.providers.anthropic.anthropic_solver import AnthropicSolver
from evals.task_state import TaskState, Message
# Instantiate the solver with a Claude model
solver = AnthropicSolver(
model_name="claude-3-opus-20240229",
max_tokens=1024,
extra_options={"temperature": 0.7},
)
# Build a task state
task_state = TaskState(
task_description="You are a helpful assistant.",
messages=[
Message(role="user", content="What is the capital of France?"),
],
)
# Solve the task
result = solver(task_state)
print(result.output) # e.g. "The capital of France is Paris."
# Access solver metadata
print(solver.name) # "claude-3-opus-20240229"
print(solver.model_version) # "claude-3-opus-20240229"