Implementation:Openai Openai node Evals Resource
| Knowledge Sources | |
|---|---|
| Domains | SDK, Evals |
| Last Updated | 2026-02-15 12:00 GMT |
Overview
The Evals class is the Evals resource in the openai-node SDK, providing methods to create, retrieve, update, list, and delete evaluations that define testing criteria for assessing model performance.
Description
The Evals class extends APIResource and wraps the /evals REST endpoints. It is accessed via client.evals and provides full CRUD operations for managing evaluation definitions. An evaluation (Eval) represents a task to be tested against your LLM integration, such as improving chatbot quality, testing customer support scenarios, or comparing model performance.
Each evaluation is defined by a data_source_config and a set of testing_criteria. The data source config can be custom (with a user-defined JSON schema), logs (querying stored logs by metadata), or stored_completions (deprecated). Testing criteria are defined as graders, which can be of several types: LabelModelGrader (uses a model to assign labels), StringCheckGrader (string matching), TextSimilarityGrader (similarity metrics with a pass threshold), PythonGrader (runs a Python script), or ScoreModelGrader (uses a model to assign numeric scores).
The resource also exposes a runs sub-resource (of type Runs) for creating and managing evaluation runs that execute the evaluation against specific data and models. Key response types (EvalCreateResponse, EvalRetrieveResponse, EvalUpdateResponse, EvalListResponse) all share the same structure: id, created_at, data_source_config, metadata, name, object ('eval'), and testing_criteria.
Usage
Use this resource to define and manage evaluation configurations. After creating an evaluation with appropriate data source config and graders, use the runs sub-resource to execute the evaluation against different models and parameters.
Code Reference
Source Location
- Repository: openai-node
- File: src/resources/evals/evals.ts
Signature
export class Evals extends APIResource {
runs: RunsAPI.Runs;
create(body: EvalCreateParams, options?: RequestOptions): APIPromise<EvalCreateResponse>;
retrieve(evalID: string, options?: RequestOptions): APIPromise<EvalRetrieveResponse>;
update(evalID: string, body: EvalUpdateParams, options?: RequestOptions): APIPromise<EvalUpdateResponse>;
list(
query?: EvalListParams | null,
options?: RequestOptions,
): PagePromise<EvalListResponsesPage, EvalListResponse>;
delete(evalID: string, options?: RequestOptions): APIPromise<EvalDeleteResponse>;
}
Import
import OpenAI from 'openai';
// Access via client.evals
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data_source_config (create) | Logs | StoredCompletions | Yes | Configuration for the data source; Custom requires an item_schema JSON schema |
| testing_criteria (create) | StringCheckGrader | TextSimilarity | Python | ScoreModel> | Yes | List of graders defining how to evaluate results |
| name (create) | string |
No | The name of the evaluation |
| metadata (create/update) | null | No | Up to 16 key-value pairs for structured storage |
| evalID (retrieve/update/delete) | string |
Yes | The ID of the evaluation |
| order (list) | 'desc' | No | Sort order for evals by timestamp |
| order_by (list) | 'updated_at' | No | Field to sort by |
Outputs
| Name | Type | Description |
|---|---|---|
| EvalCreateResponse | EvalCreateResponse |
Created eval with id, created_at, data_source_config, metadata, name, object ('eval'), testing_criteria |
| EvalRetrieveResponse | EvalRetrieveResponse |
Retrieved eval with same structure |
| EvalUpdateResponse | EvalUpdateResponse |
Updated eval with same structure |
| EvalListResponse | EvalListResponse |
Paginated list item with same structure |
| EvalDeleteResponse | EvalDeleteResponse |
Object with deleted (boolean), eval_id, and object fields |
Usage Examples
import OpenAI from 'openai';
const client = new OpenAI();
// Create an evaluation with a custom data source and string check grader
const eval_ = await client.evals.create({
name: 'My Chatbot Quality Eval',
data_source_config: {
type: 'custom',
item_schema: {
type: 'object',
properties: {
question: { type: 'string' },
expected_answer: { type: 'string' },
},
required: ['question', 'expected_answer'],
},
},
testing_criteria: [
{
type: 'string_check',
name: 'exact_match',
input: '{{sample.output_text}}',
reference: '{{item.expected_answer}}',
operation: 'eq',
},
],
});
console.log(eval_.id);
// Retrieve an evaluation
const retrieved = await client.evals.retrieve(eval_.id);
// Update an evaluation
const updated = await client.evals.update(eval_.id, {
name: 'Renamed Eval',
metadata: { version: 'v2' },
});
// List evaluations
for await (const e of client.evals.list({ order: 'desc' })) {
console.log(e.id, e.name);
}
// Delete an evaluation
const deleted = await client.evals.delete(eval_.id);