Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Evals Lambada

From Leeroopedia
Knowledge Sources
Domains Evaluation, Language Modeling
Last Updated 2026-02-14 10:00 GMT

Overview

Concrete eval for measuring next-word prediction accuracy on the LAMBADA benchmark, provided by the evals library.

Description

The Lambada class implements a language-model evaluation based on the LAMBADA benchmark (loaded from the EleutherAI/lambada_openai HuggingFace dataset). For each sample, it splits the text into a context (all words except the last) and an expected answer (the final word). It then constructs a prompt asking the model to predict the most likely next word, calls the completion function with temperature=0.0 and max_tokens=8, and uses evals.record_and_check_match to record whether the sampled output matches the expected word. The run method loads the specified subset's test split, evaluates all samples, and returns an accuracy metric computed from recorded match events.

Usage

Import Lambada when configuring an eval that tests a language model's ability to predict the final word of naturally occurring passages. The class is typically registered in an eval YAML spec with a subset parameter (e.g., "en" for English) selecting the LAMBADA language variant.

Code Reference

Source Location

Signature

class Lambada(evals.Eval):
    def __init__(
        self,
        completion_fns: list[CompletionFn],
        subset: str,
        *args,
        **kwargs,
    ):
        ...

    def eval_sample(self, sample, rng):
        ...

    def run(self, recorder: RecorderBase) -> dict:
        ...

Import

from evals.elsuite.lambada import Lambada

I/O Contract

Inputs

__init__

Name Type Required Description
completion_fns list[CompletionFn] Yes List containing exactly one completion function to evaluate
subset str Yes Language subset of the LAMBADA dataset to use (e.g., "en", "de", "fr")
*args Any No Positional arguments forwarded to the parent evals.Eval constructor
**kwargs Any No Keyword arguments forwarded to the parent evals.Eval constructor

eval_sample

Name Type Required Description
sample dict Yes A single dataset row with a "text" field containing a complete sentence
rng Random Yes Random number generator (unused in this eval)

run

Name Type Required Description
recorder RecorderBase Yes Recorder instance that collects match events during evaluation

Outputs

run

Name Type Description
accuracy float Fraction of samples where the model's prediction exactly matched the expected final word

Usage Examples

from evals.elsuite.lambada import Lambada
from evals.api import CompletionFn
from evals.record import RecorderBase

# Assuming `my_completion_fn` is a configured CompletionFn instance
# and `recorder` is a RecorderBase instance:
lambada_eval = Lambada(
    completion_fns=[my_completion_fn],
    subset="en",
)
results = lambada_eval.run(recorder)
print(f"Accuracy: {results['accuracy']:.2%}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment