Implementation:Openai Evals SelfConsistencySolver

Knowledge Sources	Openai_Evals
Domains	Evaluation, Solvers
Last Updated	2026-02-14 10:00 GMT

Overview

Concrete tool for self-consistency prompting with majority-vote answer selection provided by the evals library.

Description

SelfConsistencySolver is a subclass of NestedSolver that implements the self-consistency prompting technique. It works by generating multiple chain-of-thought completions for the same prompt and then selecting the consensus answer. This approach exploits the intuition that correct reasoning paths tend to converge on the same answer, even when intermediate steps differ.

The solver operates in two modes controlled by the mode parameter:

"count" mode (default): Each completion's answer is extracted using the answer_prefix (default: "The answer is"), and the most frequently occurring answer is selected via majority vote using Python's Counter. If no answers can be extracted from any completion, an error is logged.

"judge" mode: All reasoning completions are passed to a judge_solver along with a judge_prompt that asks the judge to determine the consensus answer. If the judge cannot extract a valid answer, "[NO CONSENSUS]" is returned.

The solver supports persistent memory through PersistentMemoryCache. When enabled, the cache tracks the CoT prompt and all num_generations reasoning completions as private messages, preserving them across multi-turn evaluations. The private_interaction_length parameter controls how many additional messages beyond the reasoning completions to cache.

The _extract_answer method performs case-insensitive matching on the answer_prefix to locate and extract the answer portion from each completion's raw output text. It raises a ValueError if the prefix is not found, which is caught and logged during the generation loop.

Usage

Import SelfConsistencySolver when you need more robust answers than a single chain-of-thought can provide. This is particularly effective for tasks with discrete answers (e.g., multiple choice, numerical computation) where generating multiple reasoning paths and voting on the result significantly improves accuracy.

Code Reference

Source Location

Repository: Openai_Evals
File: evals/solvers/nested/self_consistency_solver.py
Lines: 1-150

Signature

class SelfConsistencySolver(NestedSolver):
    def __init__(
        self,
        solver: SolverSpec,
        num_generations: int = 5,
        cot_template: str = DEFAULT_COT_TEMPLATE,
        answer_prefix: str = DEFAULT_ANSWER_PREFIX,
        judge_prompt: Optional[str] = None,
        mode: str = "count",
        persistent_memory: bool = True,
        private_interaction_length: int = 1,
        postprocessors: list[str] = [],
        registry: Any = None,
    ):
        ...

    @property
    def solver(self) -> Solver:
        ...

    @property
    def judge_solver(self) -> Solver:
        ...

    def _solve(self, task_state: TaskState, **kwargs) -> SolverResult:
        ...

    def _extract_answer(self, raw_result: SolverResult) -> str:
        ...

    @property
    def name(self) -> str:
        # returns "SelfConsistencySolver wrapping {solver.name}"
        ...

Import

from evals.solvers.nested.self_consistency_solver import SelfConsistencySolver

I/O Contract

Inputs

Name	Type	Required	Description
solver	SolverSpec	Yes	Specification for the solver used to generate individual chain-of-thought completions. Also used as the judge_solver by default.
num_generations	int	No	Number of chain-of-thought completions to generate. Defaults to 5.
cot_template	str	No	Template string for the chain-of-thought prompt. Contains a {prefix} placeholder that gets formatted with the answer_prefix. Defaults to DEFAULT_COT_TEMPLATE.
answer_prefix	str	No	Prefix string used to locate the answer within each completion. Defaults to "The answer is".
judge_prompt	Optional[str]	No	Custom prompt for the judge in "judge" mode. Contains {question} and {prefix} placeholders. Defaults to DEFAULT_JUDGE_PROMPT if not provided.
mode	str	No	Consensus selection mode: "count" for majority vote or "judge" for LLM-based consensus. Defaults to "count".
persistent_memory	bool	No	Whether to maintain private reasoning messages across turns. Defaults to True.
private_interaction_length	int	No	Base number of additional private messages to cache (num_generations is added internally). Defaults to 1.
postprocessors	list[str]	No	List of postprocessor names to apply to solver output. Defaults to an empty list.
registry	Any	No	Registry object for resource lookup.
task_state	TaskState	Yes	The current evaluation task state, passed to _solve.

Outputs

Name	Type	Description
SolverResult	SolverResult	Contains output (the consensus answer string) and reasoning_completions (list of all raw chain-of-thought completion strings).
name	str	Returns "SelfConsistencySolver wrapping {solver.name}" identifying the wrapped solver.

Usage Examples

from evals.solvers.nested.self_consistency_solver import SelfConsistencySolver
from evals.solvers.solver import SolverSpec

# Define SelfConsistencySolver via YAML-style config (typical usage)
# solver:
#   class: evals.solvers.nested.self_consistency_solver:SelfConsistencySolver
#   args:
#     solver:
#       class: evals.solvers.openai_solver:OpenAISolver
#       args:
#         model: gpt-4
#     num_generations: 7
#     mode: count

# Programmatic usage with majority vote mode
solver_spec = SolverSpec(
    class_name="evals.solvers.openai_solver:OpenAISolver",
    args={"model": "gpt-4"},
)

solver = SelfConsistencySolver(
    solver=solver_spec,
    num_generations=7,
    mode="count",
    answer_prefix="The answer is",
)

result = solver(task_state)
print(result.output)                 # consensus answer from majority vote
print(result.reasoning_completions)  # all 7 raw reasoning completions

# Using judge mode for more nuanced consensus
judge_solver = SelfConsistencySolver(
    solver=solver_spec,
    num_generations=5,
    mode="judge",
)

result = judge_solver(task_state)
print(result.output)  # consensus answer determined by the judge LLM

Related Pages

Environment:Openai_Evals_Python_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment