Principle:Avdvg InjectGuard Similarity Search Detection
| Knowledge Sources | |
|---|---|
| Domains | Security, Information_Retrieval, Anomaly_Detection |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
A detection technique that classifies input text as malicious or benign by measuring its vector distance to the nearest known malicious prompt in an indexed corpus and comparing against a configurable threshold.
Description
Similarity search detection is the core inference step of the vector similarity pipeline. Given an input text, the system:
- Embeds the input into the same vector space as the malicious prompt corpus.
- Queries the FAISS index for the nearest neighbor (k=1).
- Compares the L2 distance to a configurable threshold (sim_k).
- Classifies the input as malicious (distance < threshold) or benign (distance >= threshold).
This approach is a form of nearest-neighbor classification applied to security: if an input is "close enough" to any known attack in embedding space, it is flagged. The threshold parameter sim_k controls the precision-recall tradeoff:
- Lower sim_k: More aggressive detection (higher recall, lower precision). More inputs flagged as malicious.
- Higher sim_k: More conservative detection (lower recall, higher precision). Only very close matches flagged.
Usage
Use this principle for real-time prompt injection detection where latency is critical and interpretability is desired. The similarity score and matched source provide explainable results (unlike black-box classifier models). It is most effective when the malicious prompt corpus has good coverage of known attack patterns.
Theoretical Basis
The detection decision function is:
Where:
- is the embedding of input text x
- are embeddings of malicious prompts in the corpus
- is L2 distance
- is the similarity threshold (sim_k)
This is equivalent to a 1-nearest-neighbor classifier with a rejection threshold. The decision boundary forms a hypersphere of radius around each malicious prompt in embedding space.
Pseudo-code:
# Abstract algorithm for similarity search detection
query_vector = embedding_model.encode(input_text)
nearest_doc, distance = index.search(query_vector, k=1)
if distance < threshold:
return MALICIOUS, distance, nearest_doc
else:
return BENIGN, distance, nearest_doc