Implementation:Openai Evals Get Final Results From Dir
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Data_Analysis |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
Concrete tool for extracting final results from evaluation log files provided by the evals log utilities module.
Description
The get_final_results_from_dir function scans a directory for JSONL log files, extracts the final_report entry from each, and returns a dictionary mapping file paths to their results. Companion functions extract_final_results and extract_individual_results provide single-file extraction and sample-level metric extraction respectively.
Usage
Use these functions after completing evaluation runs to programmatically access results. Useful for building comparison tables, dashboards, or automated reporting.
Code Reference
Source Location
- Repository: openai/evals
- File: evals/utils/log_utils.py (lines 6-61)
Signature
def get_final_results_from_dir(log_dir: Union[str, Path]) -> dict[Path, dict]:
"""
Given a directory of log files, return a dictionary mapping
log file paths to final results.
Args:
log_dir: Path to directory containing .log files.
Returns:
Dictionary mapping each log file Path to its final_report dict.
"""
def extract_final_results(path: Path) -> dict:
"""
Given a path to a log file, find and return the "final_report" dictionary.
Args:
path: Path to a single JSONL log file.
Returns:
The final_report dictionary.
Raises:
ValueError: If no final_report found in the file.
"""
def extract_individual_results(path: Path, type_string: str = "metrics") -> list[dict]:
"""
Given a path to a log file, grab all the individual sample results.
Args:
path: Path to a single JSONL log file.
type_string: Event type to filter (default "metrics").
Returns:
List of data dictionaries from matching events.
"""
Import
from evals.utils.log_utils import (
get_final_results_from_dir,
extract_final_results,
extract_individual_results,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| log_dir | Union[str, Path] | Yes (for get_final_results_from_dir) | Directory containing .log files |
| path | Path | Yes (for extract_*) | Path to individual log file |
| type_string | str | No | Event type to filter (default "metrics") |
Outputs
| Name | Type | Description |
|---|---|---|
| get_final_results_from_dir | dict[Path, dict] | Map of log file path to final_report dict |
| extract_final_results | dict | Single final_report dictionary |
| extract_individual_results | list[dict] | List of per-sample result dictionaries |
Usage Examples
Aggregate Results from a Directory
from pathlib import Path
from evals.utils.log_utils import get_final_results_from_dir
results = get_final_results_from_dir("/tmp/evallogs/")
for log_path, final_report in results.items():
print(f"{log_path.name}: {final_report}")
# e.g. "run123_gpt-4_test-match.jsonl: {'accuracy': 0.95}"
Extract Individual Sample Results
from pathlib import Path
from evals.utils.log_utils import extract_individual_results
metrics = extract_individual_results(
Path("/tmp/evallogs/my_run.jsonl"),
type_string="metrics",
)
for m in metrics[:5]:
print(m) # e.g. {"accuracy": 1.0, "f1": 0.92}