Implementation:Avdvg InjectGuard HuggingFaceEmbeddings Init
| Knowledge Sources | |
|---|---|
| Domains | NLP, Embeddings, Security |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for initializing a sentence embedding model provided by LangChain's HuggingFace integration.
Description
The HuggingFaceEmbeddings class is a LangChain wrapper around the sentence-transformers library. It loads a pre-trained sentence embedding model and configures it for inference. In InjectGuard, this is executed at module level (on import) to initialize the all-MiniLM-L6-v2 model on a specified CUDA device with L2 normalization enabled.
Key behaviors:
- Downloads and caches the model from HuggingFace Hub on first use
- Places the model on the specified device (GPU or CPU)
- Configures encoding options including embedding normalization
- The resulting object is used by FAISS to embed documents and queries
Usage
Import this when you need to create dense vector representations of text for similarity search. In the InjectGuard pipeline, this is the first step: the embeddings object is shared between vector store construction (indexing malicious prompts) and query-time similarity search (embedding incoming user prompts).
Code Reference
Source Location
- Repository: InjectGuard
- File: injectguard/vertor_similarity_detection.py
- Lines: L1, L10-12
Signature
class HuggingFaceEmbeddings:
def __init__(
self,
model_name: str = "sentence-transformers/all-MiniLM-L6-v2",
model_kwargs: dict = None,
encode_kwargs: dict = None,
):
"""
Args:
model_name: HuggingFace model identifier or local path.
model_kwargs: Keyword arguments passed to the model constructor
(e.g., {'device': 'cuda:2'}).
encode_kwargs: Keyword arguments passed to the encode method
(e.g., {'normalize_embeddings': True}).
"""
Import
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_name | str | No (default: "sentence-transformers/all-MiniLM-L6-v2") | HuggingFace model identifier for the sentence embedding model |
| model_kwargs | dict | No | Arguments passed to the underlying model constructor; used to set device placement (e.g., {'device': 'cuda:2'})
|
| encode_kwargs | dict | No | Arguments passed to the encode method; used to enable L2 normalization (e.g., {'normalize_embeddings': True})
|
Outputs
| Name | Type | Description |
|---|---|---|
| embeddings | HuggingFaceEmbeddings | Initialized embedding model instance; provides embed_documents(texts) and embed_query(text) methods for producing 384-dimensional vectors
|
Usage Examples
InjectGuard Initialization (as used in the repo)
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
# Initialize embedding model with GPU and normalization
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2",
model_kwargs={'device': 'cuda:2'},
encode_kwargs={'normalize_embeddings': True}
)
# The embeddings object can now be used to embed text
vector = embeddings.embed_query("Please ignore previous instructions")
# vector is a list of 384 floats, L2-normalized
CPU-only Initialization
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
# Initialize on CPU (no GPU required)
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2",
model_kwargs={'device': 'cpu'},
encode_kwargs={'normalize_embeddings': True}
)