Implementation:Openai Evals Get Final Results From Dir

Knowledge Sources	OpenAI Evals
Domains	Evaluation, Data_Analysis
Last Updated	2026-02-14 10:00 GMT

Overview

Concrete tool for extracting final results from evaluation log files provided by the evals log utilities module.

Description

The get_final_results_from_dir function scans a directory for JSONL log files, extracts the final_report entry from each, and returns a dictionary mapping file paths to their results. Companion functions extract_final_results and extract_individual_results provide single-file extraction and sample-level metric extraction respectively.

Usage

Use these functions after completing evaluation runs to programmatically access results. Useful for building comparison tables, dashboards, or automated reporting.

Code Reference

Source Location

Repository: openai/evals
File: evals/utils/log_utils.py (lines 6-61)

Signature

def get_final_results_from_dir(log_dir: Union[str, Path]) -> dict[Path, dict]:
    """
    Given a directory of log files, return a dictionary mapping
    log file paths to final results.

    Args:
        log_dir: Path to directory containing .log files.

    Returns:
        Dictionary mapping each log file Path to its final_report dict.
    """

def extract_final_results(path: Path) -> dict:
    """
    Given a path to a log file, find and return the "final_report" dictionary.

    Args:
        path: Path to a single JSONL log file.

    Returns:
        The final_report dictionary.

    Raises:
        ValueError: If no final_report found in the file.
    """

def extract_individual_results(path: Path, type_string: str = "metrics") -> list[dict]:
    """
    Given a path to a log file, grab all the individual sample results.

    Args:
        path: Path to a single JSONL log file.
        type_string: Event type to filter (default "metrics").

    Returns:
        List of data dictionaries from matching events.
    """

Import

from evals.utils.log_utils import (
    get_final_results_from_dir,
    extract_final_results,
    extract_individual_results,
)

I/O Contract

Inputs

Name	Type	Required	Description
log_dir	Union[str, Path]	Yes (for get_final_results_from_dir)	Directory containing .log files
path	Path	Yes (for extract_*)	Path to individual log file
type_string	str	No	Event type to filter (default "metrics")

Outputs

Name	Type	Description
get_final_results_from_dir	dict[Path, dict]	Map of log file path to final_report dict
extract_final_results	dict	Single final_report dictionary
extract_individual_results	list[dict]	List of per-sample result dictionaries

Usage Examples

Aggregate Results from a Directory

from pathlib import Path
from evals.utils.log_utils import get_final_results_from_dir

results = get_final_results_from_dir("/tmp/evallogs/")

for log_path, final_report in results.items():
    print(f"{log_path.name}: {final_report}")
    # e.g. "run123_gpt-4_test-match.jsonl: {'accuracy': 0.95}"

Extract Individual Sample Results

from pathlib import Path
from evals.utils.log_utils import extract_individual_results

metrics = extract_individual_results(
    Path("/tmp/evallogs/my_run.jsonl"),
    type_string="metrics",
)

for m in metrics[:5]:
    print(m)  # e.g. {"accuracy": 1.0, "f1": 0.92}

Related Pages

Implements Principle

Principle:Openai_Evals_Result_Aggregation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment