Overview
Concrete eval for measuring next-word prediction accuracy on the LAMBADA benchmark, provided by the evals library.
Description
The Lambada class implements a language-model evaluation based on the LAMBADA benchmark (loaded from the EleutherAI/lambada_openai HuggingFace dataset). For each sample, it splits the text into a context (all words except the last) and an expected answer (the final word). It then constructs a prompt asking the model to predict the most likely next word, calls the completion function with temperature=0.0 and max_tokens=8, and uses evals.record_and_check_match to record whether the sampled output matches the expected word. The run method loads the specified subset's test split, evaluates all samples, and returns an accuracy metric computed from recorded match events.
Usage
Import Lambada when configuring an eval that tests a language model's ability to predict the final word of naturally occurring passages. The class is typically registered in an eval YAML spec with a subset parameter (e.g., "en" for English) selecting the LAMBADA language variant.
Code Reference
Source Location
Signature
class Lambada(evals.Eval):
def __init__(
self,
completion_fns: list[CompletionFn],
subset: str,
*args,
**kwargs,
):
...
def eval_sample(self, sample, rng):
...
def run(self, recorder: RecorderBase) -> dict:
...
Import
from evals.elsuite.lambada import Lambada
I/O Contract
Inputs
__init__
| Name |
Type |
Required |
Description
|
| completion_fns |
list[CompletionFn] |
Yes |
List containing exactly one completion function to evaluate
|
| subset |
str |
Yes |
Language subset of the LAMBADA dataset to use (e.g., "en", "de", "fr")
|
| *args |
Any |
No |
Positional arguments forwarded to the parent evals.Eval constructor
|
| **kwargs |
Any |
No |
Keyword arguments forwarded to the parent evals.Eval constructor
|
eval_sample
| Name |
Type |
Required |
Description
|
| sample |
dict |
Yes |
A single dataset row with a "text" field containing a complete sentence
|
| rng |
Random |
Yes |
Random number generator (unused in this eval)
|
run
| Name |
Type |
Required |
Description
|
| recorder |
RecorderBase |
Yes |
Recorder instance that collects match events during evaluation
|
Outputs
run
| Name |
Type |
Description
|
| accuracy |
float |
Fraction of samples where the model's prediction exactly matched the expected final word
|
Usage Examples
from evals.elsuite.lambada import Lambada
from evals.api import CompletionFn
from evals.record import RecorderBase
# Assuming `my_completion_fn` is a configured CompletionFn instance
# and `recorder` is a RecorderBase instance:
lambada_eval = Lambada(
completion_fns=[my_completion_fn],
subset="en",
)
results = lambada_eval.run(recorder)
print(f"Accuracy: {results['accuracy']:.2%}")
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.