Implementation:Avdvg InjectGuard FAISS From Documents

Knowledge Sources	InjectGuard LangChain FAISS FAISS Wiki
Domains	Information_Retrieval, Vector_Search, Security
Last Updated	2026-02-14 16:00 GMT

Overview

Concrete tool for constructing a FAISS vector store from LangChain Document objects provided by the LangChain FAISS integration.

Description

The FAISS class in LangChain wraps Facebook AI Similarity Search to provide a vector store that can be built from documents and queried for nearest neighbors. The from_documents class method is the primary factory: it takes a list of Document objects and an embedding model, embeds all documents, and builds a FAISS index (IndexFlatL2 by default).

In InjectGuard, this constructs the malicious prompt vector database at module level. The resulting vector_store object is a module-level global used by the sim_search function for query-time detection.

Key behaviors:

Embeds all documents using the provided embedding model
Builds a FAISS IndexFlatL2 (exact L2 search) by default
Stores document-to-vector mappings for retrieval
Supports both similarity_search (returns documents) and similarity_search_with_score (returns documents + distances)

Usage

Use this when you need to build an in-memory vector index from a collection of LangChain Document objects. In InjectGuard, this is called once at module initialization to index all known malicious prompts, producing the searchable vector store used for detection.

Code Reference

Source Location

Repository: InjectGuard
File: injectguard/vertor_similarity_detection.py
Lines: L4, L47

Signature

class FAISS:
    @classmethod
    def from_documents(
        cls,
        documents: list,
        embedding: object,
        **kwargs
    ) -> "FAISS":
        """
        Build a FAISS vector store from a list of Document objects.

        Args:
            documents: List of LangChain Document objects to index.
            embedding: Embedding model instance (must implement embed_documents).
            **kwargs: Additional arguments passed to FAISS index construction.

        Returns:
            FAISS: Initialized vector store with all documents indexed.
        """

Import

from langchain.vectorstores import FAISS

I/O Contract

Inputs

Name	Type	Required	Description
documents	list[Document]	Yes	LangChain Document objects to embed and index (from CSVLoader in Step 2)
embedding	Embeddings	Yes	Embedding model instance providing embed_documents() method (HuggingFaceEmbeddings from Step 1)

Outputs

Name	Type	Description
vector_store	FAISS	FAISS-backed vector store; supports similarity_search_with_score(query, k) for retrieval with L2 distances

Usage Examples

InjectGuard Vector Store Construction (as used in the repo)

from langchain.vectorstores import FAISS
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain_community.document_loaders.csv_loader import CSVLoader

# Step 1: Initialize embeddings
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': 'cuda:2'},
    encode_kwargs={'normalize_embeddings': True}
)

# Step 2: Load malicious dataset
loader = CSVLoader(file_path='./dataset/malicious_data_demo.csv')
docs = loader.load()

# Step 3: Build vector store
vector_store = FAISS.from_documents(docs, embeddings)
print("success build vector database!")

# Now vector_store can be queried:
results = vector_store.similarity_search_with_score("ignore previous instructions", k=1)
doc, score = results[0]
print(f"Nearest match: {doc.page_content}, L2 distance: {score}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment