Implementation:Openai Evals Classify Function
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, LLM_as_Judge |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
Concrete tool for running model-graded classification using a spec and completion function provided by the classify_utils module.
Description
The classify function executes a model-graded evaluation for a single sample. It takes a ModelGradedSpec, appends the appropriate answer prompt based on eval_type, invokes the grading completion function, parses the response to extract a choice string, and computes a numeric score. It returns the chosen classification and metadata including score, sampled text, and validity flag.
Usage
Called internally by ModelBasedClassify.eval_sample for each test sample. Can also be used directly for custom model-graded evaluation logic.
Code Reference
Source Location
- Repository: openai/evals
- File: evals/elsuite/modelgraded/classify_utils.py (lines 51-87)
Signature
def classify(
mg: ModelGradedSpec,
completion_fn: CompletionFn,
completion_kwargs: Optional[dict[str, Any]] = None,
format_kwargs: Optional[dict[str, Any]] = None,
eval_type: Optional[str] = None,
n: Optional[int] = None,
match_fn: Optional[str] = None,
) -> tuple[str, dict]:
"""
Run model-graded classification for a single sample.
Args:
mg: ModelGradedSpec with prompt template and choice configuration.
completion_fn: CompletionFn for the grading model.
completion_kwargs: Extra kwargs for the completion call (e.g. max_tokens).
format_kwargs: Values to fill prompt template placeholders.
eval_type: Override eval_type from spec ("classify", "classify_cot", "cot_classify").
n: Number of completions (for multi-completion specs).
match_fn: Override match function ("include", "exact", "endswith", "starts_or_endswith").
Returns:
Tuple of (choice_string, info_dict) where info_dict contains
score, sampled text, prompt, and invalid_choice flag.
"""
Import
from evals.elsuite.modelgraded.classify_utils import classify, ANSWER_PROMPTS, MATCH_FNS
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| mg | ModelGradedSpec | Yes | Evaluation specification with prompt, choices, scoring |
| completion_fn | CompletionFn | Yes | Grading model completion function |
| completion_kwargs | dict | No | Extra kwargs for completion call |
| format_kwargs | dict | No | Values for template placeholders |
| eval_type | str | No | Classification strategy override |
| match_fn | str | No | Match function override (default "starts_or_endswith") |
Outputs
| Name | Type | Description |
|---|---|---|
| choice | str | Selected choice string or "__invalid__" if parsing failed |
| info | dict | Contains: score (float or None), sampled (list[str]), prompt, invalid_choice (bool) |
Usage Examples
Direct Classification
from evals.elsuite.modelgraded.classify_utils import classify
from evals.elsuite.modelgraded.base import ModelGradedSpec
from evals.registry import Registry
registry = Registry()
mg = registry.get_modelgraded_spec("fact")
completion_fn = registry.make_completion_fn("gpt-4")
choice, info = classify(
mg=mg,
completion_fn=completion_fn,
format_kwargs={
"input": "What is the capital of France?",
"ideal": "Paris",
"completion": "The capital of France is Paris.",
},
eval_type="cot_classify",
)
print(f"Choice: {choice}") # "Yes"
print(f"Score: {info['score']}") # 1.0