Principle:Run llama Llama index Embedding Finetune Configuration
Overview
Embedding Finetune Configuration covers the design decisions and setup involved in configuring an embedding model for finetuning. This includes choosing between full Sentence Transformers finetuning and adapter-based finetuning, selecting appropriate loss functions, configuring training hyperparameters, and preparing the training infrastructure.
Concept: Sentence Transformers Finetuning vs Adapter-Based Finetuning
LlamaIndex supports two distinct approaches to embedding finetuning:
| Approach | Description | When to Use |
|---|---|---|
| Sentence Transformers (Full) | Finetunes all parameters of a Sentence Transformer model on domain-specific data | When you have sufficient training data and want maximum performance improvement |
| Adapter-Based | Freezes the base embedding model and trains a lightweight adapter layer on top | When you want to preserve the base model's general capabilities while adding domain specialization |
Full Finetuning
Full finetuning modifies all model weights. The SentenceTransformersFinetuneEngine loads a pretrained Sentence Transformer model (e.g., BAAI/bge-small-en) and trains it end-to-end on query-document pairs. This approach:
- Provides the most flexibility for domain adaptation
- Requires more training data for good generalization
- Produces a self-contained model that can be loaded directly
Adapter-Based Finetuning
Adapter finetuning adds a small trainable layer (typically a linear transformation) on top of frozen base embeddings. The EmbeddingAdapterFinetuneEngine embeds all queries and documents using the base model first, then trains the adapter to transform these embeddings for better retrieval. This approach:
- Requires less training data
- Preserves the base model's general-purpose capabilities
- Results in a smaller additional model artifact (just the adapter weights)
Concept: Loss Functions for Embedding Training
The choice of loss function is critical for embedding finetuning quality:
MultipleNegativesRankingLoss (Default)
This is the default loss function used by SentenceTransformersFinetuneEngine. It implements a form of InfoNCE (Information Noise-Contrastive Estimation) loss:
- Given a batch of (query, positive_document) pairs, it treats all other documents in the batch as negatives
- The loss encourages the model to rank the positive document higher than all in-batch negatives
- No explicit negative mining is needed -- negatives come "for free" from the batch
The mathematical formulation is:
L = -log( exp(sim(q, d+)) / sum_i(exp(sim(q, d_i))) )
where sim is cosine similarity, d+ is the positive document, and d_i iterates over all documents in the batch.
Custom Loss Functions
Users can provide any Sentence Transformers-compatible loss function via the loss parameter. Common alternatives include:
- CosineSimilarityLoss -- For when you have explicit similarity scores
- TripletLoss -- When you have explicit (anchor, positive, negative) triplets
- ContrastiveLoss -- For binary similar/dissimilar pairs
Concept: Key Hyperparameters
| Hyperparameter | Default | Impact |
|---|---|---|
| model_id | "BAAI/bge-small-en" |
The pretrained model to start from. Larger models generally perform better but require more resources. |
| batch_size | 10 |
Larger batches provide more in-batch negatives for MultipleNegativesRankingLoss, potentially improving quality. |
| epochs | 2 |
Number of passes through the training data. Too many epochs can lead to overfitting on small datasets. |
| evaluation_steps | 50 |
How often to evaluate on the validation set during training. |
| use_all_docs | False |
If True, creates training pairs for all relevant documents per query (not just the first). |
Concept: Warmup Steps
Warmup steps are automatically calculated as 10% of total training steps:
warmup_steps = int(len(data_loader) * epochs * 0.1)
During warmup, the learning rate gradually increases from zero to the target learning rate. This prevents destabilizing the pretrained weights with large initial gradient updates.
Concept: Validation and Evaluation
When a val_dataset is provided, an InformationRetrievalEvaluator is created to measure retrieval quality during training. This evaluator:
- Uses the validation queries, corpus, and relevance judgments
- Computes standard information retrieval metrics (e.g., MRR, NDCG, MAP)
- Runs at intervals defined by
evaluation_steps
This allows monitoring for overfitting and selecting the best checkpoint.
Concept: Checkpoint Management
For long-running finetuning jobs, checkpoint support enables:
- save_checkpoints -- Enable/disable checkpoint saving
- checkpoint_save_steps -- Save a checkpoint every N training steps
- checkpoint_save_total_limit -- Maximum number of checkpoints to keep (0 = unlimited)
- resume_from_checkpoint -- Resume training from the latest checkpoint
Knowledge Sources
LlamaIndex Embedding Finetuning Guide Sentence Transformers Training Overview MultipleNegativesRankingLoss
Metadata
Machine Learning Embeddings Finetuning Contrastive Learning LlamaIndex
Implementation:Run_llama_Llama_index_SentenceTransformersFinetuneEngine_Init Heuristic:Run_llama_Llama_index_Finetuning_Warmup_Steps