Implementation:Openai Evals Get Jsonl
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Data_Engineering |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
Concrete tool for loading JSONL evaluation datasets from local or remote paths provided by the evals data module.
Description
The get_jsonl function loads JSON Lines files from a given path, supporting both individual files and directories (recursively loading all .jsonl files within). It uses blobfile for transparent access to local filesystem, GCS, and other cloud storage backends. Compressed files (.gz, .lz4, .zst) are automatically decompressed. Each line is parsed as a JSON object, with detailed error messages on parse failures including file path and line number.
Usage
Use this function when loading evaluation datasets in JSONL format. It is called internally by Eval.get_samples() and can be used directly for dataset validation or preprocessing.
Code Reference
Source Location
- Repository: openai/evals
- File: evals/data.py (lines 120-133)
Signature
def get_jsonl(path: str) -> list[dict]:
"""
Extract json lines from the given path.
If the path is a directory, look in subpaths recursively.
Return all lines from all jsonl files as a single list.
Args:
path: Path to a JSONL file or directory containing JSONL files.
Supports local paths, cloud URLs (via blobfile), and
compressed formats (.gz, .lz4, .zst).
Returns:
List of dictionaries, one per JSON line.
Raises:
ValueError: On JSON parse errors with file:line detail.
RuntimeError: If file cannot be opened.
"""
Import
from evals.data import get_jsonl
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str | Yes | Path to JSONL file or directory. Supports local, cloud (blobfile), and compressed formats. |
Outputs
| Name | Type | Description |
|---|---|---|
| return value | list[dict] | List of parsed JSON objects, one per line across all files |
Usage Examples
Load a Local Dataset
from evals.data import get_jsonl
# Load from a single file
samples = get_jsonl("evals/registry/data/test/test_match.jsonl")
print(f"Loaded {len(samples)} samples")
print(samples[0]) # {"input": [...], "ideal": "expected_answer"}
# Load from a directory (all .jsonl files)
all_samples = get_jsonl("evals/registry/data/test/")
Load Compressed Data
from evals.data import get_jsonl
# Automatically handles compression
samples = get_jsonl("data/my_eval.jsonl.gz")
samples = get_jsonl("data/my_eval.jsonl.lz4")
samples = get_jsonl("data/my_eval.jsonl.zst")