Implementation:Openai Evals Get Jsonl

Knowledge Sources	OpenAI Evals
Domains	Evaluation, Data_Engineering
Last Updated	2026-02-14 10:00 GMT

Overview

Concrete tool for loading JSONL evaluation datasets from local or remote paths provided by the evals data module.

Description

The get_jsonl function loads JSON Lines files from a given path, supporting both individual files and directories (recursively loading all .jsonl files within). It uses blobfile for transparent access to local filesystem, GCS, and other cloud storage backends. Compressed files (.gz, .lz4, .zst) are automatically decompressed. Each line is parsed as a JSON object, with detailed error messages on parse failures including file path and line number.

Usage

Use this function when loading evaluation datasets in JSONL format. It is called internally by Eval.get_samples() and can be used directly for dataset validation or preprocessing.

Code Reference

Source Location

Repository: openai/evals
File: evals/data.py (lines 120-133)

Signature

def get_jsonl(path: str) -> list[dict]:
    """
    Extract json lines from the given path.
    If the path is a directory, look in subpaths recursively.

    Return all lines from all jsonl files as a single list.

    Args:
        path: Path to a JSONL file or directory containing JSONL files.
              Supports local paths, cloud URLs (via blobfile), and
              compressed formats (.gz, .lz4, .zst).

    Returns:
        List of dictionaries, one per JSON line.

    Raises:
        ValueError: On JSON parse errors with file:line detail.
        RuntimeError: If file cannot be opened.
    """

Import

from evals.data import get_jsonl

I/O Contract

Inputs

Name	Type	Required	Description
path	str	Yes	Path to JSONL file or directory. Supports local, cloud (blobfile), and compressed formats.

Outputs

Name	Type	Description
return value	list[dict]	List of parsed JSON objects, one per line across all files

Usage Examples

Load a Local Dataset

from evals.data import get_jsonl

# Load from a single file
samples = get_jsonl("evals/registry/data/test/test_match.jsonl")
print(f"Loaded {len(samples)} samples")
print(samples[0])  # {"input": [...], "ideal": "expected_answer"}

# Load from a directory (all .jsonl files)
all_samples = get_jsonl("evals/registry/data/test/")

Load Compressed Data

from evals.data import get_jsonl

# Automatically handles compression
samples = get_jsonl("data/my_eval.jsonl.gz")
samples = get_jsonl("data/my_eval.jsonl.lz4")
samples = get_jsonl("data/my_eval.jsonl.zst")

Related Pages

Implements Principle

Principle:Openai_Evals_Dataset_Preparation

Uses Heuristic

Heuristic:Openai_Evals_Chat_Format_Recommendation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment