Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Openai Evals Eval Registration

From Leeroopedia
Knowledge Sources
Domains Evaluation, Configuration
Last Updated 2026-02-14 10:00 GMT

Overview

A YAML-based configuration pattern that registers evaluation classes with the framework by defining their class path, constructor arguments, and metadata.

Description

Eval Registration is the mechanism by which new evaluations are made discoverable to the oaieval CLI and the Registry system. Each eval is defined as a YAML entry in files under evals/registry/evals/, specifying the fully-qualified Python class path and constructor arguments. The registration supports a two-level hierarchy: a base eval entry (with metrics and description) and a versioned split entry (with the actual class and args). Aliases allow one eval name to point to another, enabling versioning workflows.

Usage

Register an eval after implementing an Eval class or when configuring a built-in template with new data. This is the step that makes the eval available via oaieval <model> <eval-name>.

Theoretical Basis

The registration follows a declarative configuration pattern:

Two-level YAML structure:

# Level 1: Base eval (metadata)
my-eval:
  id: my-eval.dev.v0
  metrics: [accuracy]
  description: "My custom evaluation"

# Level 2: Versioned split (implementation)
my-eval.dev.v0:
  class: evals.elsuite.basic.match.Match
  args:
    samples_jsonl: my_data/test.jsonl
    max_tokens: 100

Key conventions:

  • Base eval name has no dots (e.g. "my-eval")
  • Splits use dot notation (e.g. "my-eval.dev.v0")
  • The id field in the base eval points to the default split
  • class must be a fully-qualified Python class path
  • args are passed as keyword arguments to the class constructor

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment