Implementation:Openai Evals Match Eval

Knowledge Sources	OpenAI Evals
Domains	Evaluation, NLP
Last Updated	2026-02-14 10:00 GMT

Overview

Concrete tool for exact-match evaluation of model outputs against expected answers provided by the evals basic eval suite.

Description

The Match class is a built-in eval template that compares model completions against expected answers using exact string matching via record_and_check_match. It supports few-shot prompting by prepending example interactions from a separate JSONL file. The Includes and FuzzyMatch classes provide alternative matching strategies (substring and normalized F1 respectively) with the same interface. All three templates follow the same pattern: load samples, invoke the completion function, compare output to ideal, and record accuracy metrics.

Usage

Use Match when model output must exactly equal the expected answer. Reference it as "evals.elsuite.basic.match.Match" in YAML eval registration. For substring matching use Includes, for approximate matching use FuzzyMatch.

Code Reference

Source Location

Repository: openai/evals
File: evals/elsuite/basic/match.py (lines 9-65)

Signature

class Match(evals.Eval):
    def __init__(
        self,
        completion_fns: list[CompletionFn],
        samples_jsonl: str,
        *args,
        max_tokens: int = 500,
        num_few_shot: int = 0,
        few_shot_jsonl: str = None,
        **kwargs,
    ):
        """
        Args:
            completion_fns: List containing exactly one CompletionFn.
            samples_jsonl: Path to JSONL dataset with "input" and "ideal" fields.
            max_tokens: Maximum tokens for model generation (default 500).
            num_few_shot: Number of few-shot examples to prepend (default 0).
            few_shot_jsonl: Path to JSONL file with few-shot examples.
        """

    def eval_sample(self, sample: Any, *_) -> str:
        """Evaluate a single sample using exact match."""

    def run(self, recorder) -> dict:
        """Run evaluation and return accuracy metrics."""

Import

from evals.elsuite.basic.match import Match
from evals.elsuite.basic.includes import Includes
from evals.elsuite.basic.fuzzy_match import FuzzyMatch

I/O Contract

Inputs

Name	Type	Required	Description
completion_fns	list[CompletionFn]	Yes	Exactly one completion function
samples_jsonl	str	Yes	Path to JSONL with "input" and "ideal" keys
max_tokens	int	No	Max generation tokens (default 500)
num_few_shot	int	No	Number of few-shot examples (default 0)
few_shot_jsonl	str	No	Path to JSONL with few-shot examples (required if num_few_shot > 0)

Outputs

Name	Type	Description
run() returns	dict	{"accuracy": float, "boostrap_std": float}

Usage Examples

YAML Registration for Match

my-exact-match:
  id: my-exact-match.dev.v0
  metrics: [accuracy]

my-exact-match.dev.v0:
  class: evals.elsuite.basic.match.Match
  args:
    samples_jsonl: my_data/questions.jsonl
    max_tokens: 100

YAML Registration for FuzzyMatch

my-fuzzy-eval:
  id: my-fuzzy-eval.dev.v0
  metrics: [accuracy]

my-fuzzy-eval.dev.v0:
  class: evals.elsuite.basic.fuzzy_match.FuzzyMatch
  args:
    samples_jsonl: my_data/freeform.jsonl
    max_tokens: 200

Running via CLI

oaieval gpt-3.5-turbo my-exact-match
oaieval gpt-4 my-fuzzy-eval --max_samples 50

Related Pages

Implements Principle

Principle:Openai_Evals_Eval_Template_Selection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment