Implementation:Avdvg InjectGuard FAISS From Documents
| Knowledge Sources | |
|---|---|
| Domains | Information_Retrieval, Vector_Search, Security |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for constructing a FAISS vector store from LangChain Document objects provided by the LangChain FAISS integration.
Description
The FAISS class in LangChain wraps Facebook AI Similarity Search to provide a vector store that can be built from documents and queried for nearest neighbors. The from_documents class method is the primary factory: it takes a list of Document objects and an embedding model, embeds all documents, and builds a FAISS index (IndexFlatL2 by default).
In InjectGuard, this constructs the malicious prompt vector database at module level. The resulting vector_store object is a module-level global used by the sim_search function for query-time detection.
Key behaviors:
- Embeds all documents using the provided embedding model
- Builds a FAISS IndexFlatL2 (exact L2 search) by default
- Stores document-to-vector mappings for retrieval
- Supports both similarity_search (returns documents) and similarity_search_with_score (returns documents + distances)
Usage
Use this when you need to build an in-memory vector index from a collection of LangChain Document objects. In InjectGuard, this is called once at module initialization to index all known malicious prompts, producing the searchable vector store used for detection.
Code Reference
Source Location
- Repository: InjectGuard
- File: injectguard/vertor_similarity_detection.py
- Lines: L4, L47
Signature
class FAISS:
@classmethod
def from_documents(
cls,
documents: list,
embedding: object,
**kwargs
) -> "FAISS":
"""
Build a FAISS vector store from a list of Document objects.
Args:
documents: List of LangChain Document objects to index.
embedding: Embedding model instance (must implement embed_documents).
**kwargs: Additional arguments passed to FAISS index construction.
Returns:
FAISS: Initialized vector store with all documents indexed.
"""
Import
from langchain.vectorstores import FAISS
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| documents | list[Document] | Yes | LangChain Document objects to embed and index (from CSVLoader in Step 2) |
| embedding | Embeddings | Yes | Embedding model instance providing embed_documents() method (HuggingFaceEmbeddings from Step 1) |
Outputs
| Name | Type | Description |
|---|---|---|
| vector_store | FAISS | FAISS-backed vector store; supports similarity_search_with_score(query, k) for retrieval with L2 distances |
Usage Examples
InjectGuard Vector Store Construction (as used in the repo)
from langchain.vectorstores import FAISS
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain_community.document_loaders.csv_loader import CSVLoader
# Step 1: Initialize embeddings
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2",
model_kwargs={'device': 'cuda:2'},
encode_kwargs={'normalize_embeddings': True}
)
# Step 2: Load malicious dataset
loader = CSVLoader(file_path='./dataset/malicious_data_demo.csv')
docs = loader.load()
# Step 3: Build vector store
vector_store = FAISS.from_documents(docs, embeddings)
print("success build vector database!")
# Now vector_store can be queried:
results = vector_store.similarity_search_with_score("ignore previous instructions", k=1)
doc, score = results[0]
print(f"Nearest match: {doc.page_content}, L2 distance: {score}")