Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Evals ModelGradedSpec

From Leeroopedia
Knowledge Sources
Domains Evaluation, LLM_as_Judge
Last Updated 2026-02-14 10:00 GMT

Overview

Concrete dataclass for defining model-graded evaluation specifications provided by the evals modelgraded module.

Description

The ModelGradedSpec is a pydantic dataclass that stores the configuration for LLM-as-judge evaluations. It defines the evaluation prompt template, valid choice strings, input-output field mapping, and optional scoring. Specs are loaded from YAML files in evals/registry/modelgraded/ by the Registry system and consumed by ModelBasedClassify and the classify function.

Usage

Define a ModelGradedSpec as a YAML file when creating model-graded evaluations. Reference it by filename (without extension) in the modelgraded_spec argument of ModelBasedClassify.

Code Reference

Source Location

  • Repository: openai/evals
  • File: evals/elsuite/modelgraded/base.py (lines 11-26)

Signature

@dataclass
class ModelGradedSpec:
    # Required fields
    prompt: Union[str, OpenAICreateChatPrompt]
    choice_strings: Union[list[str], str]
    input_outputs: dict[str, str]

    # Optional fields
    eval_type: Optional[str] = None
    choice_scores: Optional[Union[dict[str, float], str]] = None
    output_template: Optional[str] = None

    # Registry metadata
    key: Optional[str] = None
    group: Optional[str] = None

Import

from evals.elsuite.modelgraded.base import ModelGradedSpec

I/O Contract

Inputs

Name Type Required Description
prompt Union[str, list[dict]] Yes Evaluation prompt template with {placeholders}
choice_strings Union[list[str], str] Yes Valid answers: list of strings, "from_n", "from_n_abc", or "from_n_ABC"
input_outputs dict[str, str] Yes Maps sample keys to template variable names
eval_type Optional[str] No "classify", "classify_cot", or "cot_classify"
choice_scores Optional[Union[dict, str]] No Numeric scores per choice, or "from_strings"
output_template Optional[str] No Template for formatting multi-completion output

Outputs

Name Type Description
ModelGradedSpec instance ModelGradedSpec Configured spec ready for use by classify() or ModelBasedClassify

Usage Examples

YAML Spec for Factual Accuracy

# File: evals/registry/modelgraded/fact.yaml
prompt: >
  You are comparing a submitted answer to an expert answer on a given question.
  [Q]: {input}
  [A]: {ideal}
  [Submission]: {completion}
  Compare the submitted answer to the expert answer. Is the submission correct, incorrect, or unsure?
choice_strings:
  - "Yes"
  - "No"
  - "Unsure"
input_outputs:
  input: completion
  ideal: expected
eval_type: cot_classify
choice_scores:
  "Yes": 1.0
  "No": 0.0
  "Unsure": 0.5

Programmatic Construction

from evals.elsuite.modelgraded.base import ModelGradedSpec

spec = ModelGradedSpec(
    prompt="Is the following answer correct? {input} Answer: {completion} Expected: {ideal}",
    choice_strings=["Yes", "No"],
    input_outputs={"input": "completion", "ideal": "expected"},
    eval_type="classify",
    choice_scores={"Yes": 1.0, "No": 0.0},
)

Related Pages

Implements Principle

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment