Implementation:Openai Openai node Grader Models
| Knowledge Sources | |
|---|---|
| Domains | SDK, Evaluations, Grading |
| Last Updated | 2026-02-15 12:00 GMT |
Overview
The GraderModels module defines TypeScript types for evaluation grader models used to score and assess model outputs in the OpenAI evaluations system.
Description
The GraderModels resource class extends APIResource but contains no methods of its own; it serves primarily as a namespace for the comprehensive set of grader type definitions used by the OpenAI Evals API. The module defines six distinct grader types: StringCheckGrader for string comparison operations, TextSimilarityGrader for similarity metric-based grading, PythonGrader for custom Python script evaluation, ScoreModelGrader for model-based scoring, LabelModelGrader for model-based label assignment, and MultiGrader for combining multiple graders into a single score.
Each grader type includes strongly typed interfaces with clearly defined input schemas, configuration options, and output specifications. The GraderInputs type supports multimodal content including text, output text, images, and audio. The TextSimilarityGrader supports a wide range of evaluation metrics including cosine similarity, fuzzy matching, BLEU, GLEU, METEOR, and various ROUGE variants (1 through 5, and L).
The ScoreModelGrader and LabelModelGrader both accept message-based inputs with role-based instruction hierarchy (user, assistant, system, developer) and support template strings for dynamic content injection. The ScoreModelGrader includes configurable sampling parameters such as temperature, top_p, reasoning effort, and max completion tokens.
Usage
Use these types when working with the OpenAI Evals API to define grading criteria for evaluating model outputs. These types are referenced when creating evaluation configurations, setting up automated testing pipelines, or building custom evaluation frameworks that leverage OpenAI's grading infrastructure.
Code Reference
Source Location
- Repository: openai-node
- File: src/resources/graders/grader-models.ts
Signature
export class GraderModels extends APIResource {}
export interface StringCheckGrader {
input: string;
name: string;
operation: 'eq' | 'ne' | 'like' | 'ilike';
reference: string;
type: 'string_check';
}
export interface TextSimilarityGrader {
evaluation_metric: 'cosine' | 'fuzzy_match' | 'bleu' | 'gleu' | 'meteor'
| 'rouge_1' | 'rouge_2' | 'rouge_3' | 'rouge_4' | 'rouge_5' | 'rouge_l';
input: string;
name: string;
reference: string;
type: 'text_similarity';
}
export interface PythonGrader {
name: string;
source: string;
type: 'python';
image_tag?: string;
}
export interface ScoreModelGrader {
input: Array<ScoreModelGrader.Input>;
model: string;
name: string;
type: 'score_model';
range?: Array<number>;
sampling_params?: ScoreModelGrader.SamplingParams;
}
export interface LabelModelGrader {
input: Array<LabelModelGrader.Input>;
labels: Array<string>;
model: string;
name: string;
passing_labels: Array<string>;
type: 'label_model';
}
export interface MultiGrader {
calculate_output: string;
graders: StringCheckGrader | TextSimilarityGrader | PythonGrader
| ScoreModelGrader | LabelModelGrader;
name: string;
type: 'multi';
}
Import
import OpenAI from 'openai';
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input | string |
Yes | The input text to grade (supports template strings) |
| name | string |
Yes | The name of the grader |
| reference | string |
Yes (StringCheck/TextSimilarity) | The reference text to compare against |
| operation | 'ne' | 'like' | 'ilike' | Yes (StringCheck) | The string comparison operation |
| evaluation_metric | 'fuzzy_match' | 'bleu' | ... | Yes (TextSimilarity) | The similarity metric to use |
| source | string |
Yes (PythonGrader) | The Python script source code |
| model | string |
Yes (ScoreModel/LabelModel) | The model to use for evaluation |
| labels | Array<string> |
Yes (LabelModel) | Labels to assign during evaluation |
| passing_labels | Array<string> |
Yes (LabelModel) | Labels that indicate a passing result |
| calculate_output | string |
Yes (MultiGrader) | Formula to calculate combined output |
Outputs
| Name | Type | Description |
|---|---|---|
| type | string |
The grader type identifier (e.g., 'string_check', 'text_similarity', 'python', 'score_model', 'label_model', 'multi') |
| name | string |
The grader name |
Usage Examples
import OpenAI from 'openai';
// StringCheckGrader: exact match comparison
const stringCheckGrader: OpenAI.GraderModels.StringCheckGrader = {
type: 'string_check',
name: 'exact_match',
input: '{{output}}',
reference: '{{expected}}',
operation: 'eq',
};
// TextSimilarityGrader: cosine similarity
const textSimilarityGrader: OpenAI.GraderModels.TextSimilarityGrader = {
type: 'text_similarity',
name: 'cosine_sim',
input: '{{output}}',
reference: '{{expected}}',
evaluation_metric: 'cosine',
};
// PythonGrader: custom Python evaluation
const pythonGrader: OpenAI.GraderModels.PythonGrader = {
type: 'python',
name: 'custom_eval',
source: 'def grade(output, expected):\n return float(output == expected)',
};
// ScoreModelGrader: model-based scoring
const scoreModelGrader: OpenAI.GraderModels.ScoreModelGrader = {
type: 'score_model',
name: 'quality_score',
model: 'gpt-4o',
input: [
{ role: 'user', content: 'Rate the quality of: {{output}}' },
],
range: [0, 1],
};
Key Types
GraderInputs
type GraderInputs = Array<
| string
| ResponseInputText
| GraderInputs.OutputText
| GraderInputs.InputImage
| ResponseInputAudio
>;
ScoreModelGrader.SamplingParams
interface SamplingParams {
max_completions_tokens?: number | null;
reasoning_effort?: ReasoningEffort | null;
seed?: number | null;
temperature?: number | null;
top_p?: number | null;
}