Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Evals Get Jsonl

From Leeroopedia
Knowledge Sources
Domains Evaluation, Data_Engineering
Last Updated 2026-02-14 10:00 GMT

Overview

Concrete tool for loading JSONL evaluation datasets from local or remote paths provided by the evals data module.

Description

The get_jsonl function loads JSON Lines files from a given path, supporting both individual files and directories (recursively loading all .jsonl files within). It uses blobfile for transparent access to local filesystem, GCS, and other cloud storage backends. Compressed files (.gz, .lz4, .zst) are automatically decompressed. Each line is parsed as a JSON object, with detailed error messages on parse failures including file path and line number.

Usage

Use this function when loading evaluation datasets in JSONL format. It is called internally by Eval.get_samples() and can be used directly for dataset validation or preprocessing.

Code Reference

Source Location

  • Repository: openai/evals
  • File: evals/data.py (lines 120-133)

Signature

def get_jsonl(path: str) -> list[dict]:
    """
    Extract json lines from the given path.
    If the path is a directory, look in subpaths recursively.

    Return all lines from all jsonl files as a single list.

    Args:
        path: Path to a JSONL file or directory containing JSONL files.
              Supports local paths, cloud URLs (via blobfile), and
              compressed formats (.gz, .lz4, .zst).

    Returns:
        List of dictionaries, one per JSON line.

    Raises:
        ValueError: On JSON parse errors with file:line detail.
        RuntimeError: If file cannot be opened.
    """

Import

from evals.data import get_jsonl

I/O Contract

Inputs

Name Type Required Description
path str Yes Path to JSONL file or directory. Supports local, cloud (blobfile), and compressed formats.

Outputs

Name Type Description
return value list[dict] List of parsed JSON objects, one per line across all files

Usage Examples

Load a Local Dataset

from evals.data import get_jsonl

# Load from a single file
samples = get_jsonl("evals/registry/data/test/test_match.jsonl")
print(f"Loaded {len(samples)} samples")
print(samples[0])  # {"input": [...], "ideal": "expected_answer"}

# Load from a directory (all .jsonl files)
all_samples = get_jsonl("evals/registry/data/test/")

Load Compressed Data

from evals.data import get_jsonl

# Automatically handles compression
samples = get_jsonl("data/my_eval.jsonl.gz")
samples = get_jsonl("data/my_eval.jsonl.lz4")
samples = get_jsonl("data/my_eval.jsonl.zst")

Related Pages

Implements Principle

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment