Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Evals AnthropicSolver

From Leeroopedia
Revision as of 13:34, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Openai_Evals_AnthropicSolver.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Evaluation, LLM Provider Integration
Last Updated 2026-02-14 10:00 GMT

Overview

Concrete solver for running evaluation tasks against the Anthropic (Claude) API provided by the evals library.

Description

AnthropicSolver is a Solver subclass that delegates text generation to the Anthropic Messages API. It accepts an evals TaskState, converts the contained message history into the format required by the Anthropic SDK, and returns a SolverResult with the model's response.

Key behaviours:

  • Role mapping -- OpenAI-style roles (system, user, assistant) are translated to Anthropic's two-role scheme (user / assistant) via the module-level oai_to_anthropic_role dictionary. The system role is mapped to user.
  • Message merging -- Anthropic requires strictly alternating user/assistant turns. Consecutive messages that share the same role after mapping are merged into a single turn with their content blocks concatenated.
  • System prompt -- The task_description from TaskState is passed as the top-level system parameter to messages.create, separate from the message list.
  • Retry logic -- The module-level helper anthropic_create_retrying wraps the SDK's client.messages.create with exponential back-off (via evals.utils.api_utils.create_retrying) for transient errors such as RateLimitError, APIConnectionError, APITimeoutError, and InternalServerError.
  • Usage conversion -- anth_to_openai_usage converts an Anthropic Usage object into a dictionary with OpenAI-compatible keys (prompt_tokens, completion_tokens, total_tokens) so that the evals logging infrastructure can record token counts uniformly.

Usage

Import AnthropicSolver when you need to evaluate a prompt or benchmark against an Anthropic Claude model. The solver is typically referenced by class path in a YAML eval spec. It requires the ANTHROPIC_API_KEY environment variable to be set and the anthropic Python package to be installed.

Code Reference

Source Location

Signature

class AnthropicSolver(Solver):
    def __init__(
        self,
        model_name: str,
        max_tokens: int = 512,
        postprocessors: list[str] = [],
        extra_options: Optional[dict] = {},
        registry: Any = None,
    ):
    def _solve(self, task_state: TaskState, **kwargs) -> SolverResult:
    @property
    def name(self) -> str:
    @property
    def model_version(self) -> Union[str, dict]:
    @staticmethod
    def _convert_msgs_to_anthropic_format(msgs: list[Message]) -> list[MessageParam]:

def anthropic_create_retrying(client: Anthropic, *args, **kwargs):
def anth_to_openai_usage(anth_usage: Usage) -> dict:

Import

from evals.solvers.providers.anthropic.anthropic_solver import AnthropicSolver

I/O Contract

Inputs

Name Type Required Description
model_name str Yes Anthropic model identifier (e.g. "claude-3-opus-20240229").
max_tokens int No (default 512) Maximum number of tokens the model may generate in its response.
postprocessors list[str] No (default []) Fully-qualified class paths of PostProcessor instances to apply to the solver output.
extra_options Optional[dict] No (default {}) Additional keyword arguments forwarded to client.messages.create (e.g. temperature, top_p).
registry Any No (default None) Unused; accepted for interface compatibility with the solver registry.
task_state TaskState Yes (at solve time) The evaluation task state containing task_description (system prompt) and messages (conversation history).

Outputs

Name Type Description
result SolverResult Contains the model's text response in output and the raw Anthropic ContentBlock list in raw_completion_result.

Usage Examples

from evals.solvers.providers.anthropic.anthropic_solver import AnthropicSolver
from evals.task_state import TaskState, Message

# Instantiate the solver with a Claude model
solver = AnthropicSolver(
    model_name="claude-3-opus-20240229",
    max_tokens=1024,
    extra_options={"temperature": 0.7},
)

# Build a task state
task_state = TaskState(
    task_description="You are a helpful assistant.",
    messages=[
        Message(role="user", content="What is the capital of France?"),
    ],
)

# Solve the task
result = solver(task_state)
print(result.output)  # e.g. "The capital of France is Paris."

# Access solver metadata
print(solver.name)           # "claude-3-opus-20240229"
print(solver.model_version)  # "claude-3-opus-20240229"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment