Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Openai node Grader Models

From Leeroopedia
Knowledge Sources
Domains SDK, Evaluations, Grading
Last Updated 2026-02-15 12:00 GMT

Overview

The GraderModels module defines TypeScript types for evaluation grader models used to score and assess model outputs in the OpenAI evaluations system.

Description

The GraderModels resource class extends APIResource but contains no methods of its own; it serves primarily as a namespace for the comprehensive set of grader type definitions used by the OpenAI Evals API. The module defines six distinct grader types: StringCheckGrader for string comparison operations, TextSimilarityGrader for similarity metric-based grading, PythonGrader for custom Python script evaluation, ScoreModelGrader for model-based scoring, LabelModelGrader for model-based label assignment, and MultiGrader for combining multiple graders into a single score.

Each grader type includes strongly typed interfaces with clearly defined input schemas, configuration options, and output specifications. The GraderInputs type supports multimodal content including text, output text, images, and audio. The TextSimilarityGrader supports a wide range of evaluation metrics including cosine similarity, fuzzy matching, BLEU, GLEU, METEOR, and various ROUGE variants (1 through 5, and L).

The ScoreModelGrader and LabelModelGrader both accept message-based inputs with role-based instruction hierarchy (user, assistant, system, developer) and support template strings for dynamic content injection. The ScoreModelGrader includes configurable sampling parameters such as temperature, top_p, reasoning effort, and max completion tokens.

Usage

Use these types when working with the OpenAI Evals API to define grading criteria for evaluating model outputs. These types are referenced when creating evaluation configurations, setting up automated testing pipelines, or building custom evaluation frameworks that leverage OpenAI's grading infrastructure.

Code Reference

Source Location

Signature

export class GraderModels extends APIResource {}

export interface StringCheckGrader {
  input: string;
  name: string;
  operation: 'eq' | 'ne' | 'like' | 'ilike';
  reference: string;
  type: 'string_check';
}

export interface TextSimilarityGrader {
  evaluation_metric: 'cosine' | 'fuzzy_match' | 'bleu' | 'gleu' | 'meteor'
    | 'rouge_1' | 'rouge_2' | 'rouge_3' | 'rouge_4' | 'rouge_5' | 'rouge_l';
  input: string;
  name: string;
  reference: string;
  type: 'text_similarity';
}

export interface PythonGrader {
  name: string;
  source: string;
  type: 'python';
  image_tag?: string;
}

export interface ScoreModelGrader {
  input: Array<ScoreModelGrader.Input>;
  model: string;
  name: string;
  type: 'score_model';
  range?: Array<number>;
  sampling_params?: ScoreModelGrader.SamplingParams;
}

export interface LabelModelGrader {
  input: Array<LabelModelGrader.Input>;
  labels: Array<string>;
  model: string;
  name: string;
  passing_labels: Array<string>;
  type: 'label_model';
}

export interface MultiGrader {
  calculate_output: string;
  graders: StringCheckGrader | TextSimilarityGrader | PythonGrader
    | ScoreModelGrader | LabelModelGrader;
  name: string;
  type: 'multi';
}

Import

import OpenAI from 'openai';

I/O Contract

Inputs

Name Type Required Description
input string Yes The input text to grade (supports template strings)
name string Yes The name of the grader
reference string Yes (StringCheck/TextSimilarity) The reference text to compare against
operation 'ne' | 'like' | 'ilike' Yes (StringCheck) The string comparison operation
evaluation_metric 'fuzzy_match' | 'bleu' | ... Yes (TextSimilarity) The similarity metric to use
source string Yes (PythonGrader) The Python script source code
model string Yes (ScoreModel/LabelModel) The model to use for evaluation
labels Array<string> Yes (LabelModel) Labels to assign during evaluation
passing_labels Array<string> Yes (LabelModel) Labels that indicate a passing result
calculate_output string Yes (MultiGrader) Formula to calculate combined output

Outputs

Name Type Description
type string The grader type identifier (e.g., 'string_check', 'text_similarity', 'python', 'score_model', 'label_model', 'multi')
name string The grader name

Usage Examples

import OpenAI from 'openai';

// StringCheckGrader: exact match comparison
const stringCheckGrader: OpenAI.GraderModels.StringCheckGrader = {
  type: 'string_check',
  name: 'exact_match',
  input: '{{output}}',
  reference: '{{expected}}',
  operation: 'eq',
};

// TextSimilarityGrader: cosine similarity
const textSimilarityGrader: OpenAI.GraderModels.TextSimilarityGrader = {
  type: 'text_similarity',
  name: 'cosine_sim',
  input: '{{output}}',
  reference: '{{expected}}',
  evaluation_metric: 'cosine',
};

// PythonGrader: custom Python evaluation
const pythonGrader: OpenAI.GraderModels.PythonGrader = {
  type: 'python',
  name: 'custom_eval',
  source: 'def grade(output, expected):\n    return float(output == expected)',
};

// ScoreModelGrader: model-based scoring
const scoreModelGrader: OpenAI.GraderModels.ScoreModelGrader = {
  type: 'score_model',
  name: 'quality_score',
  model: 'gpt-4o',
  input: [
    { role: 'user', content: 'Rate the quality of: {{output}}' },
  ],
  range: [0, 1],
};

Key Types

GraderInputs

type GraderInputs = Array<
  | string
  | ResponseInputText
  | GraderInputs.OutputText
  | GraderInputs.InputImage
  | ResponseInputAudio
>;

ScoreModelGrader.SamplingParams

interface SamplingParams {
  max_completions_tokens?: number | null;
  reasoning_effort?: ReasoningEffort | null;
  seed?: number | null;
  temperature?: number | null;
  top_p?: number | null;
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment