Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Evals SelfConsistencySolver

From Leeroopedia
Revision as of 13:34, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Openai_Evals_SelfConsistencySolver.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Evaluation, Solvers
Last Updated 2026-02-14 10:00 GMT

Overview

Concrete tool for self-consistency prompting with majority-vote answer selection provided by the evals library.

Description

SelfConsistencySolver is a subclass of NestedSolver that implements the self-consistency prompting technique. It works by generating multiple chain-of-thought completions for the same prompt and then selecting the consensus answer. This approach exploits the intuition that correct reasoning paths tend to converge on the same answer, even when intermediate steps differ.

The solver operates in two modes controlled by the mode parameter:

  • "count" mode (default): Each completion's answer is extracted using the answer_prefix (default: "The answer is"), and the most frequently occurring answer is selected via majority vote using Python's Counter. If no answers can be extracted from any completion, an error is logged.
  • "judge" mode: All reasoning completions are passed to a judge_solver along with a judge_prompt that asks the judge to determine the consensus answer. If the judge cannot extract a valid answer, "[NO CONSENSUS]" is returned.

The solver supports persistent memory through PersistentMemoryCache. When enabled, the cache tracks the CoT prompt and all num_generations reasoning completions as private messages, preserving them across multi-turn evaluations. The private_interaction_length parameter controls how many additional messages beyond the reasoning completions to cache.

The _extract_answer method performs case-insensitive matching on the answer_prefix to locate and extract the answer portion from each completion's raw output text. It raises a ValueError if the prefix is not found, which is caught and logged during the generation loop.

Usage

Import SelfConsistencySolver when you need more robust answers than a single chain-of-thought can provide. This is particularly effective for tasks with discrete answers (e.g., multiple choice, numerical computation) where generating multiple reasoning paths and voting on the result significantly improves accuracy.

Code Reference

Source Location

Signature

class SelfConsistencySolver(NestedSolver):
    def __init__(
        self,
        solver: SolverSpec,
        num_generations: int = 5,
        cot_template: str = DEFAULT_COT_TEMPLATE,
        answer_prefix: str = DEFAULT_ANSWER_PREFIX,
        judge_prompt: Optional[str] = None,
        mode: str = "count",
        persistent_memory: bool = True,
        private_interaction_length: int = 1,
        postprocessors: list[str] = [],
        registry: Any = None,
    ):
        ...

    @property
    def solver(self) -> Solver:
        ...

    @property
    def judge_solver(self) -> Solver:
        ...

    def _solve(self, task_state: TaskState, **kwargs) -> SolverResult:
        ...

    def _extract_answer(self, raw_result: SolverResult) -> str:
        ...

    @property
    def name(self) -> str:
        # returns "SelfConsistencySolver wrapping {solver.name}"
        ...

Import

from evals.solvers.nested.self_consistency_solver import SelfConsistencySolver

I/O Contract

Inputs

Name Type Required Description
solver SolverSpec Yes Specification for the solver used to generate individual chain-of-thought completions. Also used as the judge_solver by default.
num_generations int No Number of chain-of-thought completions to generate. Defaults to 5.
cot_template str No Template string for the chain-of-thought prompt. Contains a {prefix} placeholder that gets formatted with the answer_prefix. Defaults to DEFAULT_COT_TEMPLATE.
answer_prefix str No Prefix string used to locate the answer within each completion. Defaults to "The answer is".
judge_prompt Optional[str] No Custom prompt for the judge in "judge" mode. Contains {question} and {prefix} placeholders. Defaults to DEFAULT_JUDGE_PROMPT if not provided.
mode str No Consensus selection mode: "count" for majority vote or "judge" for LLM-based consensus. Defaults to "count".
persistent_memory bool No Whether to maintain private reasoning messages across turns. Defaults to True.
private_interaction_length int No Base number of additional private messages to cache (num_generations is added internally). Defaults to 1.
postprocessors list[str] No List of postprocessor names to apply to solver output. Defaults to an empty list.
registry Any No Registry object for resource lookup.
task_state TaskState Yes The current evaluation task state, passed to _solve.

Outputs

Name Type Description
SolverResult SolverResult Contains output (the consensus answer string) and reasoning_completions (list of all raw chain-of-thought completion strings).
name str Returns "SelfConsistencySolver wrapping {solver.name}" identifying the wrapped solver.

Usage Examples

from evals.solvers.nested.self_consistency_solver import SelfConsistencySolver
from evals.solvers.solver import SolverSpec

# Define SelfConsistencySolver via YAML-style config (typical usage)
# solver:
#   class: evals.solvers.nested.self_consistency_solver:SelfConsistencySolver
#   args:
#     solver:
#       class: evals.solvers.openai_solver:OpenAISolver
#       args:
#         model: gpt-4
#     num_generations: 7
#     mode: count

# Programmatic usage with majority vote mode
solver_spec = SolverSpec(
    class_name="evals.solvers.openai_solver:OpenAISolver",
    args={"model": "gpt-4"},
)

solver = SelfConsistencySolver(
    solver=solver_spec,
    num_generations=7,
    mode="count",
    answer_prefix="The answer is",
)

result = solver(task_state)
print(result.output)                 # consensus answer from majority vote
print(result.reasoning_completions)  # all 7 raw reasoning completions

# Using judge mode for more nuanced consensus
judge_solver = SelfConsistencySolver(
    solver=solver_spec,
    num_generations=5,
    mode="judge",
)

result = judge_solver(task_state)
print(result.output)  # consensus answer determined by the judge LLM

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment