Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Promptfoo Promptfoo Assertion Grading

From Leeroopedia
Knowledge Sources
Domains Evaluation, Quality_Assurance
Last Updated 2026-02-14 08:00 GMT

Overview

A multi-strategy grading mechanism that evaluates LLM outputs against deterministic checks, embedding similarity, and LLM-as-judge rubrics.

Description

Assertion Grading is the process of scoring LLM outputs to determine whether they meet expected quality criteria. This is the critical quality gate in the evaluation pipeline.

Promptfoo supports three categories of assertions:

  • Deterministic: Exact match, contains, regex, JSON schema validation, cost/latency thresholds
  • Similarity-based: Cosine similarity between output and expected embeddings
  • Model-graded: Using another LLM to judge output quality against a rubric (llm-rubric, factuality, answer-relevance)

The grading system also supports:

  • Threshold-based scoring: A test passes if its weighted assertion score exceeds a configurable threshold
  • Named metrics: Assertions can contribute to named metrics for aggregation
  • Custom functions: JavaScript, Python, or Ruby functions as assertions

Usage

Use this principle after evaluation execution to determine pass/fail status for each test case. This is the fifth step in the pipeline and directly determines the quality metrics reported to users.

Theoretical Basis

The grading algorithm processes assertions sequentially within each test case:

Pseudo-code Logic:

1. For each assertion in test.assert:
   a. Determine assertion type (deterministic, similarity, model-graded, custom)
   b. Execute the appropriate matcher function
   c. Record: { pass: boolean, score: number, reason: string }
2. Aggregate component results:
   a. Calculate weighted score across all assertions
   b. Compare against test.threshold (default: 1.0 = all must pass)
   c. Determine overall pass/fail
3. Return GradingResult with component details

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment