Implementation:Openai Openai node Evals Resource

Knowledge Sources	Openai_Openai_node
Domains	SDK, Evals
Last Updated	2026-02-15 12:00 GMT

Overview

The Evals class is the Evals resource in the openai-node SDK, providing methods to create, retrieve, update, list, and delete evaluations that define testing criteria for assessing model performance.

Description

The Evals class extends APIResource and wraps the /evals REST endpoints. It is accessed via client.evals and provides full CRUD operations for managing evaluation definitions. An evaluation (Eval) represents a task to be tested against your LLM integration, such as improving chatbot quality, testing customer support scenarios, or comparing model performance.

Each evaluation is defined by a data_source_config and a set of testing_criteria. The data source config can be custom (with a user-defined JSON schema), logs (querying stored logs by metadata), or stored_completions (deprecated). Testing criteria are defined as graders, which can be of several types: LabelModelGrader (uses a model to assign labels), StringCheckGrader (string matching), TextSimilarityGrader (similarity metrics with a pass threshold), PythonGrader (runs a Python script), or ScoreModelGrader (uses a model to assign numeric scores).

The resource also exposes a runs sub-resource (of type Runs) for creating and managing evaluation runs that execute the evaluation against specific data and models. Key response types (EvalCreateResponse, EvalRetrieveResponse, EvalUpdateResponse, EvalListResponse) all share the same structure: id, created_at, data_source_config, metadata, name, object ('eval'), and testing_criteria.

Usage

Use this resource to define and manage evaluation configurations. After creating an evaluation with appropriate data source config and graders, use the runs sub-resource to execute the evaluation against different models and parameters.

Code Reference

Source Location

Repository: openai-node
File: src/resources/evals/evals.ts

Signature

export class Evals extends APIResource {
  runs: RunsAPI.Runs;

  create(body: EvalCreateParams, options?: RequestOptions): APIPromise<EvalCreateResponse>;

  retrieve(evalID: string, options?: RequestOptions): APIPromise<EvalRetrieveResponse>;

  update(evalID: string, body: EvalUpdateParams, options?: RequestOptions): APIPromise<EvalUpdateResponse>;

  list(
    query?: EvalListParams | null,
    options?: RequestOptions,
  ): PagePromise<EvalListResponsesPage, EvalListResponse>;

  delete(evalID: string, options?: RequestOptions): APIPromise<EvalDeleteResponse>;
}

Import

import OpenAI from 'openai';
// Access via client.evals

I/O Contract

Inputs

Name	Type	Required	Description
data_source_config (create)	Logs \| StoredCompletions	Yes	Configuration for the data source; Custom requires an item_schema JSON schema
testing_criteria (create)	StringCheckGrader \| TextSimilarity \| Python \| ScoreModel>	Yes	List of graders defining how to evaluate results
name (create)	`string`	No	The name of the evaluation
metadata (create/update)	null	No	Up to 16 key-value pairs for structured storage
evalID (retrieve/update/delete)	`string`	Yes	The ID of the evaluation
order (list)	'desc'	No	Sort order for evals by timestamp
order_by (list)	'updated_at'	No	Field to sort by

Outputs

Name	Type	Description
EvalCreateResponse	`EvalCreateResponse`	Created eval with id, created_at, data_source_config, metadata, name, object ('eval'), testing_criteria
EvalRetrieveResponse	`EvalRetrieveResponse`	Retrieved eval with same structure
EvalUpdateResponse	`EvalUpdateResponse`	Updated eval with same structure
EvalListResponse	`EvalListResponse`	Paginated list item with same structure
EvalDeleteResponse	`EvalDeleteResponse`	Object with deleted (boolean), eval_id, and object fields

Usage Examples

import OpenAI from 'openai';

const client = new OpenAI();

// Create an evaluation with a custom data source and string check grader
const eval_ = await client.evals.create({
  name: 'My Chatbot Quality Eval',
  data_source_config: {
    type: 'custom',
    item_schema: {
      type: 'object',
      properties: {
        question: { type: 'string' },
        expected_answer: { type: 'string' },
      },
      required: ['question', 'expected_answer'],
    },
  },
  testing_criteria: [
    {
      type: 'string_check',
      name: 'exact_match',
      input: '{{sample.output_text}}',
      reference: '{{item.expected_answer}}',
      operation: 'eq',
    },
  ],
});
console.log(eval_.id);

// Retrieve an evaluation
const retrieved = await client.evals.retrieve(eval_.id);

// Update an evaluation
const updated = await client.evals.update(eval_.id, {
  name: 'Renamed Eval',
  metadata: { version: 'v2' },
});

// List evaluations
for await (const e of client.evals.list({ order: 'desc' })) {
  console.log(e.id, e.name);
}

// Delete an evaluation
const deleted = await client.evals.delete(eval_.id);

Related Pages

Environment:Openai_Openai_node_Node_20_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment