Workflow:Huggingface Peft LoRA Embedding Semantic Search
| Knowledge Sources | |
|---|---|
| Domains | NLP, Fine_Tuning, Embeddings, Semantic_Search |
| Last Updated | 2026-02-07 06:00 GMT |
Overview
End-to-end process for fine-tuning a sentence embedding model for semantic search using LoRA with TaskType.FEATURE_EXTRACTION, training with cosine similarity loss and evaluating with ROC-AUC.
Description
This workflow demonstrates how to apply LoRA adapters to a pre-trained encoder model for the purpose of improving sentence embeddings for semantic search and retrieval tasks. The base encoder model (e.g., BERT, RoBERTa, or a sentence transformer) is wrapped in a custom embedding model that performs mean pooling over token embeddings and L2 normalization to produce fixed-size sentence vectors. LoRA adapters are applied to the attention layers using the FEATURE_EXTRACTION task type. The model is trained with a cosine similarity loss that learns to maximize similarity between matched pairs (e.g., query and relevant product) and minimize similarity between unmatched pairs. Training uses Accelerate for distributed support and evaluates using ROC-AUC metric.
Usage
Execute this workflow when you need to adapt a pre-trained language model to produce better sentence embeddings for a domain-specific semantic search or retrieval task. This is common in e-commerce product search (matching queries to product titles), document retrieval, FAQ matching, or any scenario where you need to compute similarity between text pairs using vector representations.
Execution Steps
Step 1: Load Base Encoder and Build Embedding Model
Load a pre-trained encoder model and tokenizer from the Hugging Face Hub. Wrap the encoder in a custom sentence embedding model that applies mean pooling over the token-level outputs (weighted by attention mask) and L2 normalization to produce unit-length embedding vectors. This custom wrapper is necessary because the base model outputs token-level representations, not sentence-level embeddings.
Key considerations:
- The custom wrapper performs mean pooling: average token embeddings weighted by attention mask
- L2 normalization ensures all embeddings lie on the unit hypersphere
- This enables cosine similarity to be computed as a simple dot product
- The wrapper inherits from nn.Module and exposes the underlying model for PEFT injection
Step 2: Apply LoRA for Feature Extraction
Configure LoRA with TaskType.FEATURE_EXTRACTION and apply it to the encoder model using get_peft_model. Target the key and query projection layers in the attention mechanism. Register custom Accelerate save and load hooks for proper PEFT checkpoint handling during distributed training.
Key considerations:
- TaskType.FEATURE_EXTRACTION tells PEFT this is an embedding/encoding task
- Target modules should match the encoder architecture (e.g., key, query, value for BERT)
- Custom save/load hooks ensure only adapter weights are checkpointed
- The embedding wrapper's pooling and normalization layers are not modified by LoRA
Step 3: Prepare Paired Training Data
Load the semantic search dataset containing text pairs (e.g., queries and product titles) with relevance labels. Tokenize each text independently to produce separate input tensors for the anchor (query) and positive/negative (product) texts. Create DataLoaders with appropriate batching and shuffling.
Key considerations:
- Each training example contains two texts (e.g., query and product title) and a relevance label
- Texts are tokenized independently since they will be encoded separately
- Prefix query and product texts with descriptive headers if using a bi-encoder approach
- Balance positive and negative examples in each batch for stable training
Step 4: Train with Cosine Similarity Loss
Run the training loop using Accelerate. For each batch, encode the anchor and positive/negative texts through the embedding model independently to get their embeddings. Compute cosine similarity between embedding pairs and apply a loss function that pushes matching pairs together and non-matching pairs apart. Support gradient accumulation for effective larger batch sizes.
Key considerations:
- Compute embeddings separately for each text in the pair (bi-encoder architecture)
- Cosine similarity between unit-normalized vectors equals their dot product
- The loss function penalizes low similarity for positive pairs and high similarity for negative pairs
- Gradient accumulation allows simulating larger batch sizes which improves contrastive learning
Step 5: Evaluate with ROC_AUC
After each training epoch, evaluate the model on the validation set by computing embeddings for all text pairs, calculating cosine similarities, and measuring ROC-AUC (Area Under the Receiver Operating Characteristic Curve). ROC-AUC measures how well the model discriminates between relevant and irrelevant pairs across all similarity thresholds.
Key considerations:
- ROC-AUC is threshold-independent and measures ranking quality
- Gather predictions across distributed processes before computing the metric
- Higher ROC-AUC indicates better discrimination between relevant and irrelevant pairs
- Save the model checkpoint when the best validation ROC-AUC is achieved
Step 6: Save Adapter and Deploy
Save the trained LoRA adapter weights. At inference time, load the base encoder, apply the adapter, wrap in the embedding model, and use it to encode queries and documents into embeddings for similarity-based retrieval. The adapter can be pushed to the Hugging Face Hub for sharing.
Key considerations:
- The adapter is a small checkpoint that adapts the encoder for the specific search domain
- At inference, encode queries and documents into embeddings, then compute cosine similarity
- Build a vector index (e.g., FAISS) over document embeddings for efficient retrieval
- Multiple domain-specific adapters can be swapped for different search verticals