Implementation:Openai Evals HumanCliSolver
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Solvers |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
Concrete tool for interactive human-in-the-loop evaluation provided by the evals library.
Description
HumanCliSolver is a subclass of Solver that enables a human evaluator to act as the solver by reading prompts printed to the command line and typing responses via standard input. When _solve is called, it concatenates the system task description and all conversation messages into a formatted prompt string, prints it to the terminal, and waits for the human to type an answer. The answer is recorded via record_sampling with the model name set to "human".
This solver is designed exclusively for single-threaded execution. Because it reads from stdin, running multiple evaluation threads simultaneously would cause prompt and input text from different threads to interleave unpredictably. The environment variable EVALS_SEQUENTIAL=1 must be set to ensure correct operation.
Usage
Import HumanCliSolver when you need to manually evaluate samples by having a human provide answers through the command line. This is useful for establishing human baselines on evaluation tasks, debugging prompts, or spot-checking eval samples interactively.
Code Reference
Source Location
- Repository: Openai_Evals
- File: evals/solvers/human_cli_solver.py
- Lines: 1-48
Signature
class HumanCliSolver(Solver):
def __init__(
self,
input_prompt: str = "assistant (you): ",
postprocessors: list[str] = [],
registry: Any = None,
):
...
def _solve(self, task_state: TaskState, **kwargs) -> SolverResult:
...
@property
def name(self) -> str:
# returns "human"
...
Import
from evals.solvers.human_cli_solver import HumanCliSolver
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input_prompt | str | No | Prompt string displayed before the human types their answer. Defaults to "assistant (you): ". |
| postprocessors | list[str] | No | List of postprocessor names to apply to solver output. Defaults to an empty list. |
| registry | Any | No | Registry object for resource lookup. Not used directly by this solver. |
| task_state | TaskState | Yes | The current evaluation task state containing the task description and message history, passed to _solve. |
Outputs
| Name | Type | Description |
|---|---|---|
| SolverResult | SolverResult | Contains the human-typed answer string as the output field. |
| name | str | Always returns "human" to identify this solver in logs and records. |
Usage Examples
from evals.solvers.human_cli_solver import HumanCliSolver
# Create solver with default prompt
solver = HumanCliSolver()
# Create solver with a custom input prompt
solver = HumanCliSolver(input_prompt="Your answer: ")
# Use in an eval run (requires EVALS_SEQUENTIAL=1)
# The solver will print the conversation and wait for typed input
result = solver._solve(task_state)
print(result.output) # whatever the human typed