Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Evals Match Eval

From Leeroopedia
Knowledge Sources
Domains Evaluation, NLP
Last Updated 2026-02-14 10:00 GMT

Overview

Concrete tool for exact-match evaluation of model outputs against expected answers provided by the evals basic eval suite.

Description

The Match class is a built-in eval template that compares model completions against expected answers using exact string matching via record_and_check_match. It supports few-shot prompting by prepending example interactions from a separate JSONL file. The Includes and FuzzyMatch classes provide alternative matching strategies (substring and normalized F1 respectively) with the same interface. All three templates follow the same pattern: load samples, invoke the completion function, compare output to ideal, and record accuracy metrics.

Usage

Use Match when model output must exactly equal the expected answer. Reference it as "evals.elsuite.basic.match.Match" in YAML eval registration. For substring matching use Includes, for approximate matching use FuzzyMatch.

Code Reference

Source Location

  • Repository: openai/evals
  • File: evals/elsuite/basic/match.py (lines 9-65)

Signature

class Match(evals.Eval):
    def __init__(
        self,
        completion_fns: list[CompletionFn],
        samples_jsonl: str,
        *args,
        max_tokens: int = 500,
        num_few_shot: int = 0,
        few_shot_jsonl: str = None,
        **kwargs,
    ):
        """
        Args:
            completion_fns: List containing exactly one CompletionFn.
            samples_jsonl: Path to JSONL dataset with "input" and "ideal" fields.
            max_tokens: Maximum tokens for model generation (default 500).
            num_few_shot: Number of few-shot examples to prepend (default 0).
            few_shot_jsonl: Path to JSONL file with few-shot examples.
        """

    def eval_sample(self, sample: Any, *_) -> str:
        """Evaluate a single sample using exact match."""

    def run(self, recorder) -> dict:
        """Run evaluation and return accuracy metrics."""

Import

from evals.elsuite.basic.match import Match
from evals.elsuite.basic.includes import Includes
from evals.elsuite.basic.fuzzy_match import FuzzyMatch

I/O Contract

Inputs

Name Type Required Description
completion_fns list[CompletionFn] Yes Exactly one completion function
samples_jsonl str Yes Path to JSONL with "input" and "ideal" keys
max_tokens int No Max generation tokens (default 500)
num_few_shot int No Number of few-shot examples (default 0)
few_shot_jsonl str No Path to JSONL with few-shot examples (required if num_few_shot > 0)

Outputs

Name Type Description
run() returns dict {"accuracy": float, "boostrap_std": float}

Usage Examples

YAML Registration for Match

my-exact-match:
  id: my-exact-match.dev.v0
  metrics: [accuracy]

my-exact-match.dev.v0:
  class: evals.elsuite.basic.match.Match
  args:
    samples_jsonl: my_data/questions.jsonl
    max_tokens: 100

YAML Registration for FuzzyMatch

my-fuzzy-eval:
  id: my-fuzzy-eval.dev.v0
  metrics: [accuracy]

my-fuzzy-eval.dev.v0:
  class: evals.elsuite.basic.fuzzy_match.FuzzyMatch
  args:
    samples_jsonl: my_data/freeform.jsonl
    max_tokens: 200

Running via CLI

oaieval gpt-3.5-turbo my-exact-match
oaieval gpt-4 my-fuzzy-eval --max_samples 50

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment