Principle:Run llama Llama index Embedding Finetune Execution

Overview

Embedding Finetune Execution covers the actual training process for embedding models. This includes the training loop mechanics, contrastive loss optimization, warmup scheduling, evaluation during training, and checkpoint management. Understanding these concepts is essential for successfully finetuning embedding models and diagnosing training issues.

Concept: The Training Loop

Embedding finetuning follows a standard supervised training loop with some domain-specific characteristics:

Forward pass -- Encode queries and documents through the embedding model
Loss computation -- Calculate contrastive loss between query and document embeddings
Backward pass -- Compute gradients via backpropagation
Parameter update -- Update model weights using the optimizer
Evaluation -- Periodically assess retrieval quality on validation data
Checkpointing -- Periodically save model state

The Sentence Transformers library abstracts this loop via model.fit(), which handles all these steps internally.

Concept: Contrastive Loss Optimization

The default MultipleNegativesRankingLoss operates on mini-batches of (query, positive_document) pairs:

For a batch of size N, each query has 1 positive document and N-1 in-batch negatives
The loss minimizes the negative log-likelihood of selecting the correct document
Larger batch sizes provide more negative examples, generally improving training quality
This is equivalent to the InfoNCE loss used in contrastive learning frameworks

The optimization objective encourages:

High similarity between a query and its relevant document
Low similarity between a query and all other documents in the batch

Concept: Learning Rate Warmup

Warmup is a critical technique when finetuning pretrained models:

Problem -- Large initial learning rates can destroy pretrained representations
Solution -- Gradually increase the learning rate from zero to the target over the first N steps
LlamaIndex default -- Warmup covers 10% of total training steps

The warmup schedule is:

warmup_steps = int(len(data_loader) * epochs * 0.1)

After warmup, a linear decay schedule is typically applied for the remainder of training.

Concept: Training Objectives Format

The Sentence Transformers model.fit() method accepts training objectives as a list of (DataLoader, loss_function) tuples:

train_objectives = [(self.loader, self.loss)]

This design allows for multi-task training where different data loaders use different loss functions. In LlamaIndex's embedding finetuning, a single objective (the QA pairs with contrastive loss) is used.

Concept: Evaluation During Training

When a validation dataset is provided, the InformationRetrievalEvaluator measures retrieval quality at regular intervals:

Metrics computed -- MRR (Mean Reciprocal Rank), NDCG (Normalized Discounted Cumulative Gain), MAP (Mean Average Precision)
Evaluation frequency -- Controlled by evaluation_steps (default: every 50 steps)
Purpose -- Detect overfitting, select the best checkpoint, and monitor convergence

Evaluation runs the current model on the validation queries, retrieves documents, and compares against the ground truth relevance judgments.

Concept: Checkpoint Management During Training

For long-running finetuning jobs, checkpointing provides fault tolerance and model selection:

Checkpoint path -- Stored under checkpoints/{model_output_path}/
Save frequency -- Controlled by checkpoint_save_steps
Storage limit -- checkpoint_save_total_limit prevents disk exhaustion (0 means keep all)
Resume capability -- Set resume_from_checkpoint=True to continue from the latest checkpoint

Concept: Training Duration Considerations

Factor	Consideration
Dataset size	Larger datasets generally require more epochs but risk less overfitting per epoch
Epochs	2-5 epochs is typical for embedding finetuning; more risks overfitting
Batch size	Affects both training speed and quality of in-batch negatives
Evaluation frequency	Too frequent evaluation slows training; too infrequent misses optimal checkpoints

Concept: Adapter Finetuning Execution

The adapter-based approach (EmbeddingAdapterFinetuneEngine) has a different execution path:

Base embeddings are pre-computed and frozen
Only the adapter layer (e.g., a linear transformation) is trained
Training uses a custom train_model function rather than Sentence Transformers' model.fit()
The loss operates on transformed query embeddings versus raw document embeddings

Knowledge Sources

LlamaIndex Embedding Finetuning Guide Sentence Transformers Training Overview

Metadata

Machine Learning Embeddings Finetuning Contrastive Learning LlamaIndex

Implementation:Run_llama_Llama_index_SentenceTransformersFinetuneEngine_Finetune

2026-02-11 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment