Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Evals Classify Function

From Leeroopedia
Knowledge Sources
Domains Evaluation, LLM_as_Judge
Last Updated 2026-02-14 10:00 GMT

Overview

Concrete tool for running model-graded classification using a spec and completion function provided by the classify_utils module.

Description

The classify function executes a model-graded evaluation for a single sample. It takes a ModelGradedSpec, appends the appropriate answer prompt based on eval_type, invokes the grading completion function, parses the response to extract a choice string, and computes a numeric score. It returns the chosen classification and metadata including score, sampled text, and validity flag.

Usage

Called internally by ModelBasedClassify.eval_sample for each test sample. Can also be used directly for custom model-graded evaluation logic.

Code Reference

Source Location

  • Repository: openai/evals
  • File: evals/elsuite/modelgraded/classify_utils.py (lines 51-87)

Signature

def classify(
    mg: ModelGradedSpec,
    completion_fn: CompletionFn,
    completion_kwargs: Optional[dict[str, Any]] = None,
    format_kwargs: Optional[dict[str, Any]] = None,
    eval_type: Optional[str] = None,
    n: Optional[int] = None,
    match_fn: Optional[str] = None,
) -> tuple[str, dict]:
    """
    Run model-graded classification for a single sample.

    Args:
        mg: ModelGradedSpec with prompt template and choice configuration.
        completion_fn: CompletionFn for the grading model.
        completion_kwargs: Extra kwargs for the completion call (e.g. max_tokens).
        format_kwargs: Values to fill prompt template placeholders.
        eval_type: Override eval_type from spec ("classify", "classify_cot", "cot_classify").
        n: Number of completions (for multi-completion specs).
        match_fn: Override match function ("include", "exact", "endswith", "starts_or_endswith").

    Returns:
        Tuple of (choice_string, info_dict) where info_dict contains
        score, sampled text, prompt, and invalid_choice flag.
    """

Import

from evals.elsuite.modelgraded.classify_utils import classify, ANSWER_PROMPTS, MATCH_FNS

I/O Contract

Inputs

Name Type Required Description
mg ModelGradedSpec Yes Evaluation specification with prompt, choices, scoring
completion_fn CompletionFn Yes Grading model completion function
completion_kwargs dict No Extra kwargs for completion call
format_kwargs dict No Values for template placeholders
eval_type str No Classification strategy override
match_fn str No Match function override (default "starts_or_endswith")

Outputs

Name Type Description
choice str Selected choice string or "__invalid__" if parsing failed
info dict Contains: score (float or None), sampled (list[str]), prompt, invalid_choice (bool)

Usage Examples

Direct Classification

from evals.elsuite.modelgraded.classify_utils import classify
from evals.elsuite.modelgraded.base import ModelGradedSpec
from evals.registry import Registry

registry = Registry()
mg = registry.get_modelgraded_spec("fact")
completion_fn = registry.make_completion_fn("gpt-4")

choice, info = classify(
    mg=mg,
    completion_fn=completion_fn,
    format_kwargs={
        "input": "What is the capital of France?",
        "ideal": "Paris",
        "completion": "The capital of France is Paris.",
    },
    eval_type="cot_classify",
)

print(f"Choice: {choice}")  # "Yes"
print(f"Score: {info['score']}")  # 1.0

Related Pages

Implements Principle

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment