Implementation:Langfuse Langfuse DatasetItemValidator
| Knowledge Sources | |
|---|---|
| Domains | Datasets, Validation, Data Normalization |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Internal validator class for dataset item payloads that combines JSON parsing/normalization with JSON Schema validation against dataset schemas, providing a 3800x+ performance improvement through one-time schema compilation.
Description
The DatasetItemValidator class is an internal component of the DatasetService that handles the complete lifecycle of preparing dataset item data for database storage. It performs two core responsibilities:
1. JSON Normalization:
- Parses JSON strings (from tRPC) and passes through already-parsed objects (from Public API)
- Sanitizes problematic C0 and C1 control characters that PostgreSQL TEXT columns cannot store (NULL bytes, vertical tabs, etc.) while preserving newlines and tabs
- Converts normalized values to Prisma-safe representations (
nullbecomesPrisma.DbNull,undefinedstaysundefinedfor partial updates)
2. Schema Validation:
- Validates
inputandexpectedOutputfields against the dataset's JSON schemas - Delegates to
DatasetSchemaValidatorwhich compiles schemas once in the constructor using Ajv - Supports a
normalizeUndefinedToNulloption for CREATE operations where undefined becomes null in the database
The class provides two public methods:
validateDatasetItemData: Validates already-parsed data against schemas. Used for internal validation where normalization has already occurred.
validateAndNormalize: The main entry point that combines normalization and validation. Returns either a success result with Prisma-ready values or an error result with detailed validation messages (input errors and/or expected output errors).
Performance: The constructor compiles schemas once via DatasetSchemaValidator, and the compiled validators are reused for all subsequent validations. This provides a 3800x+ speedup compared to fresh Ajv compilation per item.
Usage
This class is internal to the DatasetService. Use dataset-items repository methods instead of calling this class directly. Instantiate one DatasetItemValidator per batch of items sharing the same dataset schemas for optimal performance.
Code Reference
Source Location
- Repository: Langfuse
- File: packages/shared/src/server/services/DatasetService/DatasetItemValidator.ts
- Lines: 1-273
Signature
export class DatasetItemValidator {
constructor(params: {
inputSchema: Record<string, unknown> | null | undefined;
expectedOutputSchema: Record<string, unknown> | null | undefined;
});
public validateDatasetItemData(params: {
input: unknown;
expectedOutput: unknown;
normalizeUndefinedToNull?: boolean;
}): ValidateItemResult;
public validateAndNormalize(params: {
input: string | unknown | null | undefined;
expectedOutput: string | unknown | null | undefined;
metadata: string | unknown | null | undefined;
normalizeOpts?: { sanitizeControlChars?: boolean };
validateOpts: { normalizeUndefinedToNull?: boolean };
}): ValidateAndNormalizeResult;
}
Import
import { DatasetItemValidator } from "@langfuse/shared/src/server/services/DatasetService/DatasetItemValidator";
I/O Contract
Inputs
Constructor
| Name | Type | Required | Description |
|---|---|---|---|
| inputSchema | null | undefined | Yes | JSON Schema for validating the input field; null/undefined means no validation |
| expectedOutputSchema | null | undefined | Yes | JSON Schema for validating the expectedOutput field; null/undefined means no validation |
validateAndNormalize
| Name | Type | Required | Description |
|---|---|---|---|
| input | unknown | null | undefined | Yes | The input data; JSON string or parsed object |
| expectedOutput | unknown | null | undefined | Yes | The expected output data; JSON string or parsed object |
| metadata | unknown | null | undefined | Yes | Metadata; JSON string or parsed object |
| normalizeOpts.sanitizeControlChars | boolean |
No | Whether to strip control characters before storage |
| validateOpts.normalizeUndefinedToNull | boolean |
Yes | True for CREATE operations; enforces non-null input |
Outputs
validateAndNormalize (success)
| Name | Type | Description |
|---|---|---|
| success | true |
Indicates validation passed |
| input | Prisma.InputJsonValue | undefined | Normalized input ready for Prisma |
| expectedOutput | Prisma.InputJsonValue | undefined | Normalized expected output ready for Prisma |
| metadata | Prisma.InputJsonValue | undefined | Normalized metadata ready for Prisma |
validateAndNormalize (error)
| Name | Type | Description |
|---|---|---|
| success | false |
Indicates validation failed |
| message | string |
Human-readable error message |
| cause.inputErrors | FieldValidationError[] |
Validation errors for the input field |
| cause.expectedOutputErrors | FieldValidationError[] |
Validation errors for the expectedOutput field |
Usage Examples
import { DatasetItemValidator } from "./DatasetItemValidator";
// Create validator with dataset schemas (compiled once)
const validator = new DatasetItemValidator({
inputSchema: { type: "object", required: ["prompt"] },
expectedOutputSchema: null,
});
// Validate and normalize for a CREATE operation
const result = validator.validateAndNormalize({
input: '{"prompt": "Hello"}',
expectedOutput: null,
metadata: '{"source": "api"}',
normalizeOpts: { sanitizeControlChars: true },
validateOpts: { normalizeUndefinedToNull: true },
});
if (result.success) {
await prisma.datasetItem.create({ data: { input: result.input, ... } });
} else {
throw new Error(result.message);
}