Principle:Protectai Llm guard Factual Consistency Checking

Knowledge Sources	Protectai_Llm_guard
Domains	Factual_Consistency, NLI
Last Updated	2026-02-14 12:00 GMT

Overview

Verifying factual consistency between a premise and hypothesis using Natural Language Inference.

Description

Factual consistency checking addresses the problem of hallucination in large language models -- cases where the model generates statements that are not supported by or directly contradict the information provided in the input. This principle uses Natural Language Inference (NLI) to formally evaluate whether the generated output is entailed by the original prompt.

In the NLI framework, the user's prompt serves as the premise (the ground truth or given information) and the LLM's output serves as the hypothesis (the claim to be verified). A DeBERTa model trained on NLI datasets produces probability scores across three categories:

Entailment -- the hypothesis logically follows from the premise.
Contradiction -- the hypothesis conflicts with the premise.
Neutral -- the hypothesis is neither supported nor contradicted.

If the entailment probability falls below a configured minimum threshold, the output is flagged as factually inconsistent, indicating the model may have hallucinated or introduced unsupported claims.

Usage

Apply this principle when factual accuracy relative to the input is critical:

Question-answering systems where the answer must be grounded in provided context.
Summarization tasks where the summary must not introduce claims absent from the source.
Retrieval-augmented generation pipelines where outputs should faithfully reflect retrieved documents.
Any application where hallucinated content could cause harm or erode trust.

Theoretical Basis

The factual consistency check proceeds as follows:

1. Designate the user prompt as the premise and the LLM output as the hypothesis.
2. Tokenize the premise-hypothesis pair using the DeBERTa tokenizer with the [SEP] separator.
3. Pass the tokenized pair through the NLI classification model.
4. Apply softmax to the output logits to obtain class probabilities:
   P(entailment) = softmax(logits)[entailment_idx]
   P(contradiction) = softmax(logits)[contradiction_idx]
   P(neutral) = softmax(logits)[neutral_idx]
5. Compare P(entailment) against the minimum score threshold.
6. If P(entailment) < threshold, flag the output as factually inconsistent.
7. Return the entailment score as the confidence metric for downstream use.

Related Pages

Implementation:Protectai_Llm_guard_Output_FactualConsistency

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment