Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Avdvg InjectGuard Similarity Search Detection

From Leeroopedia
Knowledge Sources
Domains Security, Information_Retrieval, Anomaly_Detection
Last Updated 2026-02-14 16:00 GMT

Overview

A detection technique that classifies input text as malicious or benign by measuring its vector distance to the nearest known malicious prompt in an indexed corpus and comparing against a configurable threshold.

Description

Similarity search detection is the core inference step of the vector similarity pipeline. Given an input text, the system:

  1. Embeds the input into the same vector space as the malicious prompt corpus.
  2. Queries the FAISS index for the nearest neighbor (k=1).
  3. Compares the L2 distance to a configurable threshold (sim_k).
  4. Classifies the input as malicious (distance < threshold) or benign (distance >= threshold).

This approach is a form of nearest-neighbor classification applied to security: if an input is "close enough" to any known attack in embedding space, it is flagged. The threshold parameter sim_k controls the precision-recall tradeoff:

  • Lower sim_k: More aggressive detection (higher recall, lower precision). More inputs flagged as malicious.
  • Higher sim_k: More conservative detection (lower recall, higher precision). Only very close matches flagged.

Usage

Use this principle for real-time prompt injection detection where latency is critical and interpretability is desired. The similarity score and matched source provide explainable results (unlike black-box classifier models). It is most effective when the malicious prompt corpus has good coverage of known attack patterns.

Theoretical Basis

The detection decision function is:

detection(x)={1 (malicious)if minid(f(x),f(mi))<τ0 (benign)otherwise

Where:

  • f(x) is the embedding of input text x
  • f(mi) are embeddings of malicious prompts in the corpus
  • d(,) is L2 distance
  • τ is the similarity threshold (sim_k)

This is equivalent to a 1-nearest-neighbor classifier with a rejection threshold. The decision boundary forms a hypersphere of radius τ around each malicious prompt in embedding space.

Pseudo-code:

# Abstract algorithm for similarity search detection
query_vector = embedding_model.encode(input_text)
nearest_doc, distance = index.search(query_vector, k=1)

if distance < threshold:
    return MALICIOUS, distance, nearest_doc
else:
    return BENIGN, distance, nearest_doc

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment