Overview
FactualConsistency is an output scanner that verifies whether the LLM output is factually consistent with the input prompt using Natural Language Inference (NLI).
Description
The FactualConsistency output scanner is not a thin wrapper; it has its own standalone implementation. It uses a Natural Language Inference (NLI) model to assess whether the LLM output is entailed by the input prompt. The default model is MoritzLaurer/deberta-v3-base-zeroshot-v2.0 (shared with the BanTopics scanner). The scanner tokenizes the prompt-output pair and runs it through the NLI model, then applies torch softmax to obtain entailment probabilities. The entailment score is compared against the minimum_score threshold: if the score exceeds the threshold, the output is considered factually consistent. This approach helps detect hallucinations and fabricated information in LLM responses by checking whether the output logically follows from the provided context.
Usage
Use this scanner when factual accuracy is critical and you need to verify that LLM outputs are grounded in the information provided in the prompt. This is especially useful in retrieval-augmented generation (RAG) pipelines where the prompt contains source documents, in question-answering systems, and in any application where hallucinated facts could cause harm or misinformation.
Code Reference
Source Location
Signature
class FactualConsistency(Scanner):
def __init__(
self,
*,
model: Model | None = None,
minimum_score: float = 0.75,
use_onnx: bool = False,
) -> None: ...
def scan(self, prompt: str, output: str) -> tuple[str, bool, float]: ...
Import
from llm_guard.output_scanners import FactualConsistency
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| prompt |
str |
Yes |
The input prompt (used as the premise for NLI)
|
| output |
str |
Yes |
The LLM output to check for factual consistency (used as the hypothesis)
|
Constructor Parameters
| Name |
Type |
Required |
Default |
Description
|
| model |
None |
No |
None |
Custom NLI model (defaults to MoritzLaurer/deberta-v3-base-zeroshot-v2.0)
|
| minimum_score |
float |
No |
0.75 |
Minimum entailment score to consider the output factually consistent
|
| use_onnx |
bool |
No |
False |
Whether to use ONNX runtime for inference
|
Outputs
| Name |
Type |
Description
|
| sanitized_output |
str |
The output (unmodified)
|
| is_valid |
bool |
Whether the output is factually consistent with the prompt
|
| risk_score |
float |
Risk score (-1.0 to 1.0)
|
Usage Examples
Basic Usage
from llm_guard.output_scanners import FactualConsistency
scanner = FactualConsistency(minimum_score=0.75)
prompt = "The capital of France is Paris. Paris has a population of about 2.1 million."
output = "Paris is the capital of France with approximately 2.1 million residents."
sanitized_output, is_valid, risk_score = scanner.scan(prompt, output)
if is_valid:
print("Output is factually consistent with the prompt")
else:
print(f"Possible hallucination detected (risk: {risk_score})")
RAG Pipeline Verification
from llm_guard.output_scanners import FactualConsistency
scanner = FactualConsistency(minimum_score=0.8)
# Retrieved context as prompt
prompt = "According to the 2023 report, revenue increased by 15% to $2.3 billion."
# LLM-generated answer
output = "Revenue grew by 15% reaching $2.3 billion based on the latest report."
sanitized_output, is_valid, risk_score = scanner.scan(prompt, output)
print(f"Factually consistent: {is_valid}, Score: {risk_score}")
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.