Implementation:Openai Evals CompletionFn Protocol

Knowledge Sources	OpenAI Evals
Domains	Evaluation, Software_Architecture
Last Updated	2026-02-14 10:00 GMT

Overview

Concrete protocol and abstract base class defining the model integration interface provided by the evals api module.

Description

The CompletionFn protocol defines the callable interface (__call__(prompt, **kwargs) -> CompletionResult) and CompletionResult defines the abstract result interface (get_completions() -> list[str]). Reference implementations include OpenAIChatCompletionFn and OpenAICompletionFn for OpenAI models, plus DummyCompletionFn for testing. The helper function record_and_check_match provides a standard pattern for comparing completions against expected answers.

Usage

Implement CompletionFn and CompletionResult when creating a custom model integration. Use record_and_check_match in eval_sample methods for standard match recording.

Code Reference

Source Location

Repository: openai/evals
File: evals/api.py (lines 16-105)

Signature

class CompletionResult(ABC):
    @abstractmethod
    def get_completions(self) -> list[str]:
        """Return list of completion strings."""

@runtime_checkable
class CompletionFn(Protocol):
    def __call__(
        self,
        prompt: Union[str, OpenAICreateChatPrompt],
        **kwargs,
    ) -> CompletionResult:
        """
        Args:
            prompt: Either a text string or list of {"role": str, "content": str} dicts.
            **kwargs: Model-specific arguments (temperature, max_tokens, etc.).

        Returns:
            CompletionResult with get_completions() method.
        """

def record_and_check_match(
    prompt: Any,
    sampled: str,
    expected: Union[str, list[str], tuple[str]],
    separator: Callable[[str], bool] = None,
    options: Optional[list[str]] = None,
) -> Optional[str]:
    """
    Records and checks if a sampled response matches the expected result.

    Args:
        prompt: The input prompt.
        sampled: The sampled response from the model.
        expected: Expected response or list of acceptable responses.
        separator: Optional function to check separator characters.
        options: Optional list of options to match against.

    Returns:
        The matched option string, or None if no match.
    """

Import

from evals.api import CompletionFn, CompletionResult, DummyCompletionFn, record_and_check_match

I/O Contract

Inputs

Name	Type	Required	Description
prompt	Union[str, list[dict]]	Yes	Text prompt or chat message list
**kwargs	Any	No	Model-specific params (temperature, max_tokens, etc.)

Outputs

Name	Type	Description
CompletionResult	CompletionResult	Object with get_completions() returning list[str]

Usage Examples

Implementing a Custom CompletionFn

from evals.api import CompletionFn, CompletionResult

class MyCompletionResult(CompletionResult):
    def __init__(self, text: str):
        self.text = text

    def get_completions(self) -> list[str]:
        return [self.text]

class MyCompletionFn:
    """Custom CompletionFn - no inheritance needed, just match the protocol."""

    def __call__(self, prompt, **kwargs) -> CompletionResult:
        # Your custom model logic here
        response = my_model.generate(prompt, **kwargs)
        return MyCompletionResult(response)

Using record_and_check_match

from evals.api import record_and_check_match

# In an eval_sample method
result = self.completion_fn(prompt=sample["input"], temperature=0.0)
sampled = result.get_completions()[0]

picked = record_and_check_match(
    prompt=sample["input"],
    sampled=sampled,
    expected=sample["ideal"],
)
# Records a "match" event and returns the picked option

Related Pages

Implements Principle

Principle:Openai_Evals_Completion_Function_Protocol

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment