Implementation:Openai Openai node Grader Models

Knowledge Sources	Openai_Openai_node
Domains	SDK, Evaluations, Grading
Last Updated	2026-02-15 12:00 GMT

Overview

The GraderModels module defines TypeScript types for evaluation grader models used to score and assess model outputs in the OpenAI evaluations system.

Description

The GraderModels resource class extends APIResource but contains no methods of its own; it serves primarily as a namespace for the comprehensive set of grader type definitions used by the OpenAI Evals API. The module defines six distinct grader types: StringCheckGrader for string comparison operations, TextSimilarityGrader for similarity metric-based grading, PythonGrader for custom Python script evaluation, ScoreModelGrader for model-based scoring, LabelModelGrader for model-based label assignment, and MultiGrader for combining multiple graders into a single score.

Each grader type includes strongly typed interfaces with clearly defined input schemas, configuration options, and output specifications. The GraderInputs type supports multimodal content including text, output text, images, and audio. The TextSimilarityGrader supports a wide range of evaluation metrics including cosine similarity, fuzzy matching, BLEU, GLEU, METEOR, and various ROUGE variants (1 through 5, and L).

The ScoreModelGrader and LabelModelGrader both accept message-based inputs with role-based instruction hierarchy (user, assistant, system, developer) and support template strings for dynamic content injection. The ScoreModelGrader includes configurable sampling parameters such as temperature, top_p, reasoning effort, and max completion tokens.

Usage

Use these types when working with the OpenAI Evals API to define grading criteria for evaluating model outputs. These types are referenced when creating evaluation configurations, setting up automated testing pipelines, or building custom evaluation frameworks that leverage OpenAI's grading infrastructure.

Code Reference

Source Location

Repository: openai-node
File: src/resources/graders/grader-models.ts

Signature

export class GraderModels extends APIResource {}

export interface StringCheckGrader {
  input: string;
  name: string;
  operation: 'eq' | 'ne' | 'like' | 'ilike';
  reference: string;
  type: 'string_check';
}

export interface TextSimilarityGrader {
  evaluation_metric: 'cosine' | 'fuzzy_match' | 'bleu' | 'gleu' | 'meteor'
    | 'rouge_1' | 'rouge_2' | 'rouge_3' | 'rouge_4' | 'rouge_5' | 'rouge_l';
  input: string;
  name: string;
  reference: string;
  type: 'text_similarity';
}

export interface PythonGrader {
  name: string;
  source: string;
  type: 'python';
  image_tag?: string;
}

export interface ScoreModelGrader {
  input: Array<ScoreModelGrader.Input>;
  model: string;
  name: string;
  type: 'score_model';
  range?: Array<number>;
  sampling_params?: ScoreModelGrader.SamplingParams;
}

export interface LabelModelGrader {
  input: Array<LabelModelGrader.Input>;
  labels: Array<string>;
  model: string;
  name: string;
  passing_labels: Array<string>;
  type: 'label_model';
}

export interface MultiGrader {
  calculate_output: string;
  graders: StringCheckGrader | TextSimilarityGrader | PythonGrader
    | ScoreModelGrader | LabelModelGrader;
  name: string;
  type: 'multi';
}

Import

import OpenAI from 'openai';

I/O Contract

Inputs

Name	Type	Required	Description
input	`string`	Yes	The input text to grade (supports template strings)
name	`string`	Yes	The name of the grader
reference	`string`	Yes (StringCheck/TextSimilarity)	The reference text to compare against
operation	'ne' \| 'like' \| 'ilike'	Yes (StringCheck)	The string comparison operation
evaluation_metric	'fuzzy_match' \| 'bleu' \| ...	Yes (TextSimilarity)	The similarity metric to use
source	`string`	Yes (PythonGrader)	The Python script source code
model	`string`	Yes (ScoreModel/LabelModel)	The model to use for evaluation
labels	`Array<string>`	Yes (LabelModel)	Labels to assign during evaluation
passing_labels	`Array<string>`	Yes (LabelModel)	Labels that indicate a passing result
calculate_output	`string`	Yes (MultiGrader)	Formula to calculate combined output

Outputs

Name	Type	Description
type	`string`	The grader type identifier (e.g., 'string_check', 'text_similarity', 'python', 'score_model', 'label_model', 'multi')
name	`string`	The grader name

Usage Examples

import OpenAI from 'openai';

// StringCheckGrader: exact match comparison
const stringCheckGrader: OpenAI.GraderModels.StringCheckGrader = {
  type: 'string_check',
  name: 'exact_match',
  input: '{{output}}',
  reference: '{{expected}}',
  operation: 'eq',
};

// TextSimilarityGrader: cosine similarity
const textSimilarityGrader: OpenAI.GraderModels.TextSimilarityGrader = {
  type: 'text_similarity',
  name: 'cosine_sim',
  input: '{{output}}',
  reference: '{{expected}}',
  evaluation_metric: 'cosine',
};

// PythonGrader: custom Python evaluation
const pythonGrader: OpenAI.GraderModels.PythonGrader = {
  type: 'python',
  name: 'custom_eval',
  source: 'def grade(output, expected):\n    return float(output == expected)',
};

// ScoreModelGrader: model-based scoring
const scoreModelGrader: OpenAI.GraderModels.ScoreModelGrader = {
  type: 'score_model',
  name: 'quality_score',
  model: 'gpt-4o',
  input: [
    { role: 'user', content: 'Rate the quality of: {{output}}' },
  ],
  range: [0, 1],
};

Key Types

GraderInputs

type GraderInputs = Array<
  | string
  | ResponseInputText
  | GraderInputs.OutputText
  | GraderInputs.InputImage
  | ResponseInputAudio
>;

ScoreModelGrader.SamplingParams

interface SamplingParams {
  max_completions_tokens?: number | null;
  reasoning_effort?: ReasoningEffort | null;
  seed?: number | null;
  temperature?: number | null;
  top_p?: number | null;
}

Related Pages

Environment:Openai_Openai_node_Node_20_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment