Principle:Run llama Llama index Embedding Finetune Execution
Overview
Embedding Finetune Execution covers the actual training process for embedding models. This includes the training loop mechanics, contrastive loss optimization, warmup scheduling, evaluation during training, and checkpoint management. Understanding these concepts is essential for successfully finetuning embedding models and diagnosing training issues.
Concept: The Training Loop
Embedding finetuning follows a standard supervised training loop with some domain-specific characteristics:
- Forward pass -- Encode queries and documents through the embedding model
- Loss computation -- Calculate contrastive loss between query and document embeddings
- Backward pass -- Compute gradients via backpropagation
- Parameter update -- Update model weights using the optimizer
- Evaluation -- Periodically assess retrieval quality on validation data
- Checkpointing -- Periodically save model state
The Sentence Transformers library abstracts this loop via model.fit(), which handles all these steps internally.
Concept: Contrastive Loss Optimization
The default MultipleNegativesRankingLoss operates on mini-batches of (query, positive_document) pairs:
- For a batch of size N, each query has 1 positive document and N-1 in-batch negatives
- The loss minimizes the negative log-likelihood of selecting the correct document
- Larger batch sizes provide more negative examples, generally improving training quality
- This is equivalent to the InfoNCE loss used in contrastive learning frameworks
The optimization objective encourages:
- High similarity between a query and its relevant document
- Low similarity between a query and all other documents in the batch
Concept: Learning Rate Warmup
Warmup is a critical technique when finetuning pretrained models:
- Problem -- Large initial learning rates can destroy pretrained representations
- Solution -- Gradually increase the learning rate from zero to the target over the first N steps
- LlamaIndex default -- Warmup covers 10% of total training steps
The warmup schedule is:
warmup_steps = int(len(data_loader) * epochs * 0.1)
After warmup, a linear decay schedule is typically applied for the remainder of training.
Concept: Training Objectives Format
The Sentence Transformers model.fit() method accepts training objectives as a list of (DataLoader, loss_function) tuples:
train_objectives = [(self.loader, self.loss)]
This design allows for multi-task training where different data loaders use different loss functions. In LlamaIndex's embedding finetuning, a single objective (the QA pairs with contrastive loss) is used.
Concept: Evaluation During Training
When a validation dataset is provided, the InformationRetrievalEvaluator measures retrieval quality at regular intervals:
- Metrics computed -- MRR (Mean Reciprocal Rank), NDCG (Normalized Discounted Cumulative Gain), MAP (Mean Average Precision)
- Evaluation frequency -- Controlled by
evaluation_steps(default: every 50 steps) - Purpose -- Detect overfitting, select the best checkpoint, and monitor convergence
Evaluation runs the current model on the validation queries, retrieves documents, and compares against the ground truth relevance judgments.
Concept: Checkpoint Management During Training
For long-running finetuning jobs, checkpointing provides fault tolerance and model selection:
- Checkpoint path -- Stored under
checkpoints/{model_output_path}/ - Save frequency -- Controlled by
checkpoint_save_steps - Storage limit --
checkpoint_save_total_limitprevents disk exhaustion (0 means keep all) - Resume capability -- Set
resume_from_checkpoint=Trueto continue from the latest checkpoint
Concept: Training Duration Considerations
| Factor | Consideration |
|---|---|
| Dataset size | Larger datasets generally require more epochs but risk less overfitting per epoch |
| Epochs | 2-5 epochs is typical for embedding finetuning; more risks overfitting |
| Batch size | Affects both training speed and quality of in-batch negatives |
| Evaluation frequency | Too frequent evaluation slows training; too infrequent misses optimal checkpoints |
Concept: Adapter Finetuning Execution
The adapter-based approach (EmbeddingAdapterFinetuneEngine) has a different execution path:
- Base embeddings are pre-computed and frozen
- Only the adapter layer (e.g., a linear transformation) is trained
- Training uses a custom
train_modelfunction rather than Sentence Transformers'model.fit() - The loss operates on transformed query embeddings versus raw document embeddings
Knowledge Sources
LlamaIndex Embedding Finetuning Guide Sentence Transformers Training Overview
Metadata
Machine Learning Embeddings Finetuning Contrastive Learning LlamaIndex
Implementation:Run_llama_Llama_index_SentenceTransformersFinetuneEngine_Finetune