Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Langfuse Langfuse DatasetItemValidator

From Leeroopedia
Knowledge Sources
Domains Datasets, Validation, Data Normalization
Last Updated 2026-02-14 00:00 GMT

Overview

Internal validator class for dataset item payloads that combines JSON parsing/normalization with JSON Schema validation against dataset schemas, providing a 3800x+ performance improvement through one-time schema compilation.

Description

The DatasetItemValidator class is an internal component of the DatasetService that handles the complete lifecycle of preparing dataset item data for database storage. It performs two core responsibilities:

1. JSON Normalization:

  • Parses JSON strings (from tRPC) and passes through already-parsed objects (from Public API)
  • Sanitizes problematic C0 and C1 control characters that PostgreSQL TEXT columns cannot store (NULL bytes, vertical tabs, etc.) while preserving newlines and tabs
  • Converts normalized values to Prisma-safe representations (null becomes Prisma.DbNull, undefined stays undefined for partial updates)

2. Schema Validation:

  • Validates input and expectedOutput fields against the dataset's JSON schemas
  • Delegates to DatasetSchemaValidator which compiles schemas once in the constructor using Ajv
  • Supports a normalizeUndefinedToNull option for CREATE operations where undefined becomes null in the database

The class provides two public methods:

  • validateDatasetItemData: Validates already-parsed data against schemas. Used for internal validation where normalization has already occurred.
  • validateAndNormalize: The main entry point that combines normalization and validation. Returns either a success result with Prisma-ready values or an error result with detailed validation messages (input errors and/or expected output errors).

Performance: The constructor compiles schemas once via DatasetSchemaValidator, and the compiled validators are reused for all subsequent validations. This provides a 3800x+ speedup compared to fresh Ajv compilation per item.

Usage

This class is internal to the DatasetService. Use dataset-items repository methods instead of calling this class directly. Instantiate one DatasetItemValidator per batch of items sharing the same dataset schemas for optimal performance.

Code Reference

Source Location

Signature

export class DatasetItemValidator {
  constructor(params: {
    inputSchema: Record<string, unknown> | null | undefined;
    expectedOutputSchema: Record<string, unknown> | null | undefined;
  });

  public validateDatasetItemData(params: {
    input: unknown;
    expectedOutput: unknown;
    normalizeUndefinedToNull?: boolean;
  }): ValidateItemResult;

  public validateAndNormalize(params: {
    input: string | unknown | null | undefined;
    expectedOutput: string | unknown | null | undefined;
    metadata: string | unknown | null | undefined;
    normalizeOpts?: { sanitizeControlChars?: boolean };
    validateOpts: { normalizeUndefinedToNull?: boolean };
  }): ValidateAndNormalizeResult;
}

Import

import { DatasetItemValidator } from "@langfuse/shared/src/server/services/DatasetService/DatasetItemValidator";

I/O Contract

Inputs

Constructor

Name Type Required Description
inputSchema null | undefined Yes JSON Schema for validating the input field; null/undefined means no validation
expectedOutputSchema null | undefined Yes JSON Schema for validating the expectedOutput field; null/undefined means no validation

validateAndNormalize

Name Type Required Description
input unknown | null | undefined Yes The input data; JSON string or parsed object
expectedOutput unknown | null | undefined Yes The expected output data; JSON string or parsed object
metadata unknown | null | undefined Yes Metadata; JSON string or parsed object
normalizeOpts.sanitizeControlChars boolean No Whether to strip control characters before storage
validateOpts.normalizeUndefinedToNull boolean Yes True for CREATE operations; enforces non-null input

Outputs

validateAndNormalize (success)

Name Type Description
success true Indicates validation passed
input Prisma.InputJsonValue | undefined Normalized input ready for Prisma
expectedOutput Prisma.InputJsonValue | undefined Normalized expected output ready for Prisma
metadata Prisma.InputJsonValue | undefined Normalized metadata ready for Prisma

validateAndNormalize (error)

Name Type Description
success false Indicates validation failed
message string Human-readable error message
cause.inputErrors FieldValidationError[] Validation errors for the input field
cause.expectedOutputErrors FieldValidationError[] Validation errors for the expectedOutput field

Usage Examples

import { DatasetItemValidator } from "./DatasetItemValidator";

// Create validator with dataset schemas (compiled once)
const validator = new DatasetItemValidator({
  inputSchema: { type: "object", required: ["prompt"] },
  expectedOutputSchema: null,
});

// Validate and normalize for a CREATE operation
const result = validator.validateAndNormalize({
  input: '{"prompt": "Hello"}',
  expectedOutput: null,
  metadata: '{"source": "api"}',
  normalizeOpts: { sanitizeControlChars: true },
  validateOpts: { normalizeUndefinedToNull: true },
});

if (result.success) {
  await prisma.datasetItem.create({ data: { input: result.input, ... } });
} else {
  throw new Error(result.message);
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment