Principle:Openai Evals Eval Progress Tracking
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Reliability |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
A checkpoint-based progress tracking mechanism that enables resumption of interrupted batch evaluation runs.
Description
Eval Progress Tracking maintains a persistent record of completed evaluation commands in a JSON lines file. When a batch run is interrupted (due to error, timeout, or manual cancellation), the progress file allows resumption from the last completed eval rather than restarting from scratch. Each completed command is recorded as a JSON array of the command tokens, and on resume, these are compared against the full command list to determine which evals to skip.
Usage
Progress tracking is used automatically by oaievalset when the --resume flag is enabled (the default). The progress file is stored at /tmp/oaievalset/{model}.{eval_set}.progress.txt.
Theoretical Basis
The progress tracking follows a simple append-log pattern:
- On start, load existing progress file (if it exists and resume is enabled)
- Before execution, filter command list to exclude already-completed commands
- After each successful eval, append the command to the progress file
- On resume, the file is re-read to reconstruct the completed set