Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Evals Get Final Results From Dir

From Leeroopedia
Revision as of 13:34, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Openai_Evals_Get_Final_Results_From_Dir.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Evaluation, Data_Analysis
Last Updated 2026-02-14 10:00 GMT

Overview

Concrete tool for extracting final results from evaluation log files provided by the evals log utilities module.

Description

The get_final_results_from_dir function scans a directory for JSONL log files, extracts the final_report entry from each, and returns a dictionary mapping file paths to their results. Companion functions extract_final_results and extract_individual_results provide single-file extraction and sample-level metric extraction respectively.

Usage

Use these functions after completing evaluation runs to programmatically access results. Useful for building comparison tables, dashboards, or automated reporting.

Code Reference

Source Location

  • Repository: openai/evals
  • File: evals/utils/log_utils.py (lines 6-61)

Signature

def get_final_results_from_dir(log_dir: Union[str, Path]) -> dict[Path, dict]:
    """
    Given a directory of log files, return a dictionary mapping
    log file paths to final results.

    Args:
        log_dir: Path to directory containing .log files.

    Returns:
        Dictionary mapping each log file Path to its final_report dict.
    """

def extract_final_results(path: Path) -> dict:
    """
    Given a path to a log file, find and return the "final_report" dictionary.

    Args:
        path: Path to a single JSONL log file.

    Returns:
        The final_report dictionary.

    Raises:
        ValueError: If no final_report found in the file.
    """

def extract_individual_results(path: Path, type_string: str = "metrics") -> list[dict]:
    """
    Given a path to a log file, grab all the individual sample results.

    Args:
        path: Path to a single JSONL log file.
        type_string: Event type to filter (default "metrics").

    Returns:
        List of data dictionaries from matching events.
    """

Import

from evals.utils.log_utils import (
    get_final_results_from_dir,
    extract_final_results,
    extract_individual_results,
)

I/O Contract

Inputs

Name Type Required Description
log_dir Union[str, Path] Yes (for get_final_results_from_dir) Directory containing .log files
path Path Yes (for extract_*) Path to individual log file
type_string str No Event type to filter (default "metrics")

Outputs

Name Type Description
get_final_results_from_dir dict[Path, dict] Map of log file path to final_report dict
extract_final_results dict Single final_report dictionary
extract_individual_results list[dict] List of per-sample result dictionaries

Usage Examples

Aggregate Results from a Directory

from pathlib import Path
from evals.utils.log_utils import get_final_results_from_dir

results = get_final_results_from_dir("/tmp/evallogs/")

for log_path, final_report in results.items():
    print(f"{log_path.name}: {final_report}")
    # e.g. "run123_gpt-4_test-match.jsonl: {'accuracy': 0.95}"

Extract Individual Sample Results

from pathlib import Path
from evals.utils.log_utils import extract_individual_results

metrics = extract_individual_results(
    Path("/tmp/evallogs/my_run.jsonl"),
    type_string="metrics",
)

for m in metrics[:5]:
    print(m)  # e.g. {"accuracy": 1.0, "f1": 0.92}

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment