Principle:Promptfoo Promptfoo Assertion Grading

Knowledge Sources	Promptfoo Promptfoo Assertions
Domains	Evaluation, Quality_Assurance
Last Updated	2026-02-14 08:00 GMT

Overview

A multi-strategy grading mechanism that evaluates LLM outputs against deterministic checks, embedding similarity, and LLM-as-judge rubrics.

Description

Assertion Grading is the process of scoring LLM outputs to determine whether they meet expected quality criteria. This is the critical quality gate in the evaluation pipeline.

Promptfoo supports three categories of assertions:

Deterministic: Exact match, contains, regex, JSON schema validation, cost/latency thresholds
Similarity-based: Cosine similarity between output and expected embeddings
Model-graded: Using another LLM to judge output quality against a rubric (llm-rubric, factuality, answer-relevance)

The grading system also supports:

Threshold-based scoring: A test passes if its weighted assertion score exceeds a configurable threshold
Named metrics: Assertions can contribute to named metrics for aggregation
Custom functions: JavaScript, Python, or Ruby functions as assertions

Usage

Use this principle after evaluation execution to determine pass/fail status for each test case. This is the fifth step in the pipeline and directly determines the quality metrics reported to users.

Theoretical Basis

The grading algorithm processes assertions sequentially within each test case:

Pseudo-code Logic:

1. For each assertion in test.assert:
   a. Determine assertion type (deterministic, similarity, model-graded, custom)
   b. Execute the appropriate matcher function
   c. Record: { pass: boolean, score: number, reason: string }
2. Aggregate component results:
   a. Calculate weighted score across all assertions
   b. Compare against test.threshold (default: 1.0 = all must pass)
   c. Determine overall pass/fail
3. Return GradingResult with component details

Related Pages

Implemented By

Implementation:Promptfoo_Promptfoo_runAssertions

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment