Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Protectai Llm guard Output Relevance Checking

From Leeroopedia
Revision as of 17:46, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Protectai_Llm_guard_Output_Relevance_Checking.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains NLP, Quality_Assurance, Semantic_Similarity
Last Updated 2026-02-14 12:00 GMT

Overview

A semantic similarity technique that measures the relevance of an LLM output to its input prompt by computing cosine similarity between their vector embeddings.

Description

Output relevance checking encodes both the prompt and the LLM response into dense vector embeddings using a pre-trained sentence embedding model (e.g., BGE family). The cosine similarity between the two embeddings indicates how semantically related the output is to the prompt. Outputs below a configurable similarity threshold are flagged as irrelevant.

This technique detects several failure modes:

  • Off-topic responses: The LLM generates content unrelated to the query.
  • Hallucinated tangents: The response starts relevant but drifts off-topic.
  • Adversarial outputs: Injection attacks cause the model to generate unrelated content.

Usage

Use this principle in output scanning pipelines to verify that LLM responses are semantically relevant to the original prompt. Particularly useful in retrieval-augmented generation (RAG) pipelines and chatbot applications where response quality is critical.

Theoretical Basis

# Pseudocode for relevance checking via embedding similarity
prompt_embedding = encode(prompt)    # Dense vector via sentence model
output_embedding = encode(output)    # Dense vector via sentence model

# Cosine similarity (embeddings are L2-normalized)
similarity = dot(prompt_embedding, output_embedding)

if similarity < threshold:
    return IRRELEVANT
else:
    return RELEVANT

The BGE (BAAI General Embedding) models use CLS token pooling with L2 normalization, so cosine similarity reduces to a simple dot product.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment