Principle:Openai Evals Eval Type Configuration
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, LLM_as_Judge |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
A classification strategy configuration that determines how a grading LLM produces and structures its evaluation verdict.
Description
Eval Type Configuration selects the prompting strategy used by the grading model to produce a classification verdict. Three strategies are supported: classify (direct answer only), classify_cot (answer first, then reasoning), and cot_classify (reasoning first, then answer). The strategy affects both the prompt suffix appended to the evaluation prompt and how the grading model's response is parsed to extract the chosen classification. Different match_fn options (include, exact, endswith, starts_or_endswith) control how the choice is extracted from the response text.
Usage
Configure the eval type based on the desired trade-off between grading accuracy and token cost. cot_classify generally produces more accurate grading (the model reasons before deciding) but uses more tokens.
Theoretical Basis
Classification strategies:
- classify — Appends "Answer by printing only a single choice from {choices}..." Direct, token-efficient, but less accurate for complex judgments.
- classify_cot — Appends "First, answer by printing a single choice...Then explain your reasonings." Answer comes first, making parsing simpler.
- cot_classify — Appends "First, write out your reasoning...Then print only a single choice...repeat just the answer on a new line." Most accurate but most expensive. Response is searched in reverse order.
Match functions for extracting choice:
- starts_or_endswith (default) — Choice matches if line starts or ends with it
- include — Choice matches if it appears anywhere in the line
- exact — Choice must exactly equal the line
- endswith — Choice must appear at end of line