Implementation:Openai Evals Match Eval
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, NLP |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
Concrete tool for exact-match evaluation of model outputs against expected answers provided by the evals basic eval suite.
Description
The Match class is a built-in eval template that compares model completions against expected answers using exact string matching via record_and_check_match. It supports few-shot prompting by prepending example interactions from a separate JSONL file. The Includes and FuzzyMatch classes provide alternative matching strategies (substring and normalized F1 respectively) with the same interface. All three templates follow the same pattern: load samples, invoke the completion function, compare output to ideal, and record accuracy metrics.
Usage
Use Match when model output must exactly equal the expected answer. Reference it as "evals.elsuite.basic.match.Match" in YAML eval registration. For substring matching use Includes, for approximate matching use FuzzyMatch.
Code Reference
Source Location
- Repository: openai/evals
- File: evals/elsuite/basic/match.py (lines 9-65)
Signature
class Match(evals.Eval):
def __init__(
self,
completion_fns: list[CompletionFn],
samples_jsonl: str,
*args,
max_tokens: int = 500,
num_few_shot: int = 0,
few_shot_jsonl: str = None,
**kwargs,
):
"""
Args:
completion_fns: List containing exactly one CompletionFn.
samples_jsonl: Path to JSONL dataset with "input" and "ideal" fields.
max_tokens: Maximum tokens for model generation (default 500).
num_few_shot: Number of few-shot examples to prepend (default 0).
few_shot_jsonl: Path to JSONL file with few-shot examples.
"""
def eval_sample(self, sample: Any, *_) -> str:
"""Evaluate a single sample using exact match."""
def run(self, recorder) -> dict:
"""Run evaluation and return accuracy metrics."""
Import
from evals.elsuite.basic.match import Match
from evals.elsuite.basic.includes import Includes
from evals.elsuite.basic.fuzzy_match import FuzzyMatch
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| completion_fns | list[CompletionFn] | Yes | Exactly one completion function |
| samples_jsonl | str | Yes | Path to JSONL with "input" and "ideal" keys |
| max_tokens | int | No | Max generation tokens (default 500) |
| num_few_shot | int | No | Number of few-shot examples (default 0) |
| few_shot_jsonl | str | No | Path to JSONL with few-shot examples (required if num_few_shot > 0) |
Outputs
| Name | Type | Description |
|---|---|---|
| run() returns | dict | {"accuracy": float, "boostrap_std": float} |
Usage Examples
YAML Registration for Match
my-exact-match:
id: my-exact-match.dev.v0
metrics: [accuracy]
my-exact-match.dev.v0:
class: evals.elsuite.basic.match.Match
args:
samples_jsonl: my_data/questions.jsonl
max_tokens: 100
YAML Registration for FuzzyMatch
my-fuzzy-eval:
id: my-fuzzy-eval.dev.v0
metrics: [accuracy]
my-fuzzy-eval.dev.v0:
class: evals.elsuite.basic.fuzzy_match.FuzzyMatch
args:
samples_jsonl: my_data/freeform.jsonl
max_tokens: 200
Running via CLI
oaieval gpt-3.5-turbo my-exact-match
oaieval gpt-4 my-fuzzy-eval --max_samples 50