Principle:Avdvg InjectGuard Similarity Search Detection

Knowledge Sources	Prompt Injection Attacks and Defenses Billion-scale similarity search with GPUs InjectGuard
Domains	Security, Information_Retrieval, Anomaly_Detection
Last Updated	2026-02-14 16:00 GMT

Overview

A detection technique that classifies input text as malicious or benign by measuring its vector distance to the nearest known malicious prompt in an indexed corpus and comparing against a configurable threshold.

Description

Similarity search detection is the core inference step of the vector similarity pipeline. Given an input text, the system:

Embeds the input into the same vector space as the malicious prompt corpus.
Queries the FAISS index for the nearest neighbor (k=1).
Compares the L2 distance to a configurable threshold (sim_k).
Classifies the input as malicious (distance < threshold) or benign (distance >= threshold).

This approach is a form of nearest-neighbor classification applied to security: if an input is "close enough" to any known attack in embedding space, it is flagged. The threshold parameter sim_k controls the precision-recall tradeoff:

Lower sim_k: More aggressive detection (higher recall, lower precision). More inputs flagged as malicious.
Higher sim_k: More conservative detection (lower recall, higher precision). Only very close matches flagged.

Usage

Use this principle for real-time prompt injection detection where latency is critical and interpretability is desired. The similarity score and matched source provide explainable results (unlike black-box classifier models). It is most effective when the malicious prompt corpus has good coverage of known attack patterns.

Theoretical Basis

The detection decision function is:

$detection (x) = {\begin{cases} 1 (malicious) & if \min_{i} d (f (x), f (m_{i})) < τ \\ 0 (benign) & otherwise \end{cases}$

Where:

$f (x)$ is the embedding of input text x
$f (m_{i})$ are embeddings of malicious prompts in the corpus
$d (\cdot, \cdot)$ is L2 distance
$τ$ is the similarity threshold (sim_k)

This is equivalent to a 1-nearest-neighbor classifier with a rejection threshold. The decision boundary forms a hypersphere of radius $τ$ around each malicious prompt in embedding space.

Pseudo-code:

# Abstract algorithm for similarity search detection
query_vector = embedding_model.encode(input_text)
nearest_doc, distance = index.search(query_vector, k=1)

if distance < threshold:
    return MALICIOUS, distance, nearest_doc
else:
    return BENIGN, distance, nearest_doc

Related Pages

Implemented By

Implementation:Avdvg_InjectGuard_Sim_Search

Uses Heuristic

Heuristic:Avdvg_InjectGuard_Sim_K_Threshold_Tuning

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment