Implementation:Openai Evals CompletionFn Protocol
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Software_Architecture |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
Concrete protocol and abstract base class defining the model integration interface provided by the evals api module.
Description
The CompletionFn protocol defines the callable interface (__call__(prompt, **kwargs) -> CompletionResult) and CompletionResult defines the abstract result interface (get_completions() -> list[str]). Reference implementations include OpenAIChatCompletionFn and OpenAICompletionFn for OpenAI models, plus DummyCompletionFn for testing. The helper function record_and_check_match provides a standard pattern for comparing completions against expected answers.
Usage
Implement CompletionFn and CompletionResult when creating a custom model integration. Use record_and_check_match in eval_sample methods for standard match recording.
Code Reference
Source Location
- Repository: openai/evals
- File: evals/api.py (lines 16-105)
Signature
class CompletionResult(ABC):
@abstractmethod
def get_completions(self) -> list[str]:
"""Return list of completion strings."""
@runtime_checkable
class CompletionFn(Protocol):
def __call__(
self,
prompt: Union[str, OpenAICreateChatPrompt],
**kwargs,
) -> CompletionResult:
"""
Args:
prompt: Either a text string or list of {"role": str, "content": str} dicts.
**kwargs: Model-specific arguments (temperature, max_tokens, etc.).
Returns:
CompletionResult with get_completions() method.
"""
def record_and_check_match(
prompt: Any,
sampled: str,
expected: Union[str, list[str], tuple[str]],
separator: Callable[[str], bool] = None,
options: Optional[list[str]] = None,
) -> Optional[str]:
"""
Records and checks if a sampled response matches the expected result.
Args:
prompt: The input prompt.
sampled: The sampled response from the model.
expected: Expected response or list of acceptable responses.
separator: Optional function to check separator characters.
options: Optional list of options to match against.
Returns:
The matched option string, or None if no match.
"""
Import
from evals.api import CompletionFn, CompletionResult, DummyCompletionFn, record_and_check_match
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| prompt | Union[str, list[dict]] | Yes | Text prompt or chat message list |
| **kwargs | Any | No | Model-specific params (temperature, max_tokens, etc.) |
Outputs
| Name | Type | Description |
|---|---|---|
| CompletionResult | CompletionResult | Object with get_completions() returning list[str] |
Usage Examples
Implementing a Custom CompletionFn
from evals.api import CompletionFn, CompletionResult
class MyCompletionResult(CompletionResult):
def __init__(self, text: str):
self.text = text
def get_completions(self) -> list[str]:
return [self.text]
class MyCompletionFn:
"""Custom CompletionFn - no inheritance needed, just match the protocol."""
def __call__(self, prompt, **kwargs) -> CompletionResult:
# Your custom model logic here
response = my_model.generate(prompt, **kwargs)
return MyCompletionResult(response)
Using record_and_check_match
from evals.api import record_and_check_match
# In an eval_sample method
result = self.completion_fn(prompt=sample["input"], temperature=0.0)
sampled = result.get_completions()[0]
picked = record_and_check_match(
prompt=sample["input"],
sampled=sampled,
expected=sample["ideal"],
)
# Records a "match" event and returns the picked option