Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Avdvg InjectGuard FAISS From Documents

From Leeroopedia
Knowledge Sources
Domains Information_Retrieval, Vector_Search, Security
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for constructing a FAISS vector store from LangChain Document objects provided by the LangChain FAISS integration.

Description

The FAISS class in LangChain wraps Facebook AI Similarity Search to provide a vector store that can be built from documents and queried for nearest neighbors. The from_documents class method is the primary factory: it takes a list of Document objects and an embedding model, embeds all documents, and builds a FAISS index (IndexFlatL2 by default).

In InjectGuard, this constructs the malicious prompt vector database at module level. The resulting vector_store object is a module-level global used by the sim_search function for query-time detection.

Key behaviors:

  • Embeds all documents using the provided embedding model
  • Builds a FAISS IndexFlatL2 (exact L2 search) by default
  • Stores document-to-vector mappings for retrieval
  • Supports both similarity_search (returns documents) and similarity_search_with_score (returns documents + distances)

Usage

Use this when you need to build an in-memory vector index from a collection of LangChain Document objects. In InjectGuard, this is called once at module initialization to index all known malicious prompts, producing the searchable vector store used for detection.

Code Reference

Source Location

  • Repository: InjectGuard
  • File: injectguard/vertor_similarity_detection.py
  • Lines: L4, L47

Signature

class FAISS:
    @classmethod
    def from_documents(
        cls,
        documents: list,
        embedding: object,
        **kwargs
    ) -> "FAISS":
        """
        Build a FAISS vector store from a list of Document objects.

        Args:
            documents: List of LangChain Document objects to index.
            embedding: Embedding model instance (must implement embed_documents).
            **kwargs: Additional arguments passed to FAISS index construction.

        Returns:
            FAISS: Initialized vector store with all documents indexed.
        """

Import

from langchain.vectorstores import FAISS

I/O Contract

Inputs

Name Type Required Description
documents list[Document] Yes LangChain Document objects to embed and index (from CSVLoader in Step 2)
embedding Embeddings Yes Embedding model instance providing embed_documents() method (HuggingFaceEmbeddings from Step 1)

Outputs

Name Type Description
vector_store FAISS FAISS-backed vector store; supports similarity_search_with_score(query, k) for retrieval with L2 distances

Usage Examples

InjectGuard Vector Store Construction (as used in the repo)

from langchain.vectorstores import FAISS
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain_community.document_loaders.csv_loader import CSVLoader

# Step 1: Initialize embeddings
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': 'cuda:2'},
    encode_kwargs={'normalize_embeddings': True}
)

# Step 2: Load malicious dataset
loader = CSVLoader(file_path='./dataset/malicious_data_demo.csv')
docs = loader.load()

# Step 3: Build vector store
vector_store = FAISS.from_documents(docs, embeddings)
print("success build vector database!")

# Now vector_store can be queried:
results = vector_store.similarity_search_with_score("ignore previous instructions", k=1)
doc, score = results[0]
print(f"Nearest match: {doc.page_content}, L2 distance: {score}")

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment