Implementation:Openai Evals SelfConsistencySolver
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Solvers |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
Concrete tool for self-consistency prompting with majority-vote answer selection provided by the evals library.
Description
SelfConsistencySolver is a subclass of NestedSolver that implements the self-consistency prompting technique. It works by generating multiple chain-of-thought completions for the same prompt and then selecting the consensus answer. This approach exploits the intuition that correct reasoning paths tend to converge on the same answer, even when intermediate steps differ.
The solver operates in two modes controlled by the mode parameter:
- "count" mode (default): Each completion's answer is extracted using the answer_prefix (default: "The answer is"), and the most frequently occurring answer is selected via majority vote using Python's Counter. If no answers can be extracted from any completion, an error is logged.
- "judge" mode: All reasoning completions are passed to a judge_solver along with a judge_prompt that asks the judge to determine the consensus answer. If the judge cannot extract a valid answer, "[NO CONSENSUS]" is returned.
The solver supports persistent memory through PersistentMemoryCache. When enabled, the cache tracks the CoT prompt and all num_generations reasoning completions as private messages, preserving them across multi-turn evaluations. The private_interaction_length parameter controls how many additional messages beyond the reasoning completions to cache.
The _extract_answer method performs case-insensitive matching on the answer_prefix to locate and extract the answer portion from each completion's raw output text. It raises a ValueError if the prefix is not found, which is caught and logged during the generation loop.
Usage
Import SelfConsistencySolver when you need more robust answers than a single chain-of-thought can provide. This is particularly effective for tasks with discrete answers (e.g., multiple choice, numerical computation) where generating multiple reasoning paths and voting on the result significantly improves accuracy.
Code Reference
Source Location
- Repository: Openai_Evals
- File: evals/solvers/nested/self_consistency_solver.py
- Lines: 1-150
Signature
class SelfConsistencySolver(NestedSolver):
def __init__(
self,
solver: SolverSpec,
num_generations: int = 5,
cot_template: str = DEFAULT_COT_TEMPLATE,
answer_prefix: str = DEFAULT_ANSWER_PREFIX,
judge_prompt: Optional[str] = None,
mode: str = "count",
persistent_memory: bool = True,
private_interaction_length: int = 1,
postprocessors: list[str] = [],
registry: Any = None,
):
...
@property
def solver(self) -> Solver:
...
@property
def judge_solver(self) -> Solver:
...
def _solve(self, task_state: TaskState, **kwargs) -> SolverResult:
...
def _extract_answer(self, raw_result: SolverResult) -> str:
...
@property
def name(self) -> str:
# returns "SelfConsistencySolver wrapping {solver.name}"
...
Import
from evals.solvers.nested.self_consistency_solver import SelfConsistencySolver
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| solver | SolverSpec | Yes | Specification for the solver used to generate individual chain-of-thought completions. Also used as the judge_solver by default. |
| num_generations | int | No | Number of chain-of-thought completions to generate. Defaults to 5. |
| cot_template | str | No | Template string for the chain-of-thought prompt. Contains a {prefix} placeholder that gets formatted with the answer_prefix. Defaults to DEFAULT_COT_TEMPLATE. |
| answer_prefix | str | No | Prefix string used to locate the answer within each completion. Defaults to "The answer is". |
| judge_prompt | Optional[str] | No | Custom prompt for the judge in "judge" mode. Contains {question} and {prefix} placeholders. Defaults to DEFAULT_JUDGE_PROMPT if not provided. |
| mode | str | No | Consensus selection mode: "count" for majority vote or "judge" for LLM-based consensus. Defaults to "count". |
| persistent_memory | bool | No | Whether to maintain private reasoning messages across turns. Defaults to True. |
| private_interaction_length | int | No | Base number of additional private messages to cache (num_generations is added internally). Defaults to 1. |
| postprocessors | list[str] | No | List of postprocessor names to apply to solver output. Defaults to an empty list. |
| registry | Any | No | Registry object for resource lookup. |
| task_state | TaskState | Yes | The current evaluation task state, passed to _solve. |
Outputs
| Name | Type | Description |
|---|---|---|
| SolverResult | SolverResult | Contains output (the consensus answer string) and reasoning_completions (list of all raw chain-of-thought completion strings). |
| name | str | Returns "SelfConsistencySolver wrapping {solver.name}" identifying the wrapped solver. |
Usage Examples
from evals.solvers.nested.self_consistency_solver import SelfConsistencySolver
from evals.solvers.solver import SolverSpec
# Define SelfConsistencySolver via YAML-style config (typical usage)
# solver:
# class: evals.solvers.nested.self_consistency_solver:SelfConsistencySolver
# args:
# solver:
# class: evals.solvers.openai_solver:OpenAISolver
# args:
# model: gpt-4
# num_generations: 7
# mode: count
# Programmatic usage with majority vote mode
solver_spec = SolverSpec(
class_name="evals.solvers.openai_solver:OpenAISolver",
args={"model": "gpt-4"},
)
solver = SelfConsistencySolver(
solver=solver_spec,
num_generations=7,
mode="count",
answer_prefix="The answer is",
)
result = solver(task_state)
print(result.output) # consensus answer from majority vote
print(result.reasoning_completions) # all 7 raw reasoning completions
# Using judge mode for more nuanced consensus
judge_solver = SelfConsistencySolver(
solver=solver_spec,
num_generations=5,
mode="judge",
)
result = judge_solver(task_state)
print(result.output) # consensus answer determined by the judge LLM