Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Run llama Llama index Embedding Finetune Configuration

From Leeroopedia

Overview

Embedding Finetune Configuration covers the design decisions and setup involved in configuring an embedding model for finetuning. This includes choosing between full Sentence Transformers finetuning and adapter-based finetuning, selecting appropriate loss functions, configuring training hyperparameters, and preparing the training infrastructure.

Concept: Sentence Transformers Finetuning vs Adapter-Based Finetuning

LlamaIndex supports two distinct approaches to embedding finetuning:

Approach Description When to Use
Sentence Transformers (Full) Finetunes all parameters of a Sentence Transformer model on domain-specific data When you have sufficient training data and want maximum performance improvement
Adapter-Based Freezes the base embedding model and trains a lightweight adapter layer on top When you want to preserve the base model's general capabilities while adding domain specialization

Full Finetuning

Full finetuning modifies all model weights. The SentenceTransformersFinetuneEngine loads a pretrained Sentence Transformer model (e.g., BAAI/bge-small-en) and trains it end-to-end on query-document pairs. This approach:

  • Provides the most flexibility for domain adaptation
  • Requires more training data for good generalization
  • Produces a self-contained model that can be loaded directly

Adapter-Based Finetuning

Adapter finetuning adds a small trainable layer (typically a linear transformation) on top of frozen base embeddings. The EmbeddingAdapterFinetuneEngine embeds all queries and documents using the base model first, then trains the adapter to transform these embeddings for better retrieval. This approach:

  • Requires less training data
  • Preserves the base model's general-purpose capabilities
  • Results in a smaller additional model artifact (just the adapter weights)

Concept: Loss Functions for Embedding Training

The choice of loss function is critical for embedding finetuning quality:

MultipleNegativesRankingLoss (Default)

This is the default loss function used by SentenceTransformersFinetuneEngine. It implements a form of InfoNCE (Information Noise-Contrastive Estimation) loss:

  • Given a batch of (query, positive_document) pairs, it treats all other documents in the batch as negatives
  • The loss encourages the model to rank the positive document higher than all in-batch negatives
  • No explicit negative mining is needed -- negatives come "for free" from the batch

The mathematical formulation is:

L = -log( exp(sim(q, d+)) / sum_i(exp(sim(q, d_i))) )

where sim is cosine similarity, d+ is the positive document, and d_i iterates over all documents in the batch.

Custom Loss Functions

Users can provide any Sentence Transformers-compatible loss function via the loss parameter. Common alternatives include:

  • CosineSimilarityLoss -- For when you have explicit similarity scores
  • TripletLoss -- When you have explicit (anchor, positive, negative) triplets
  • ContrastiveLoss -- For binary similar/dissimilar pairs

Concept: Key Hyperparameters

Hyperparameter Default Impact
model_id "BAAI/bge-small-en" The pretrained model to start from. Larger models generally perform better but require more resources.
batch_size 10 Larger batches provide more in-batch negatives for MultipleNegativesRankingLoss, potentially improving quality.
epochs 2 Number of passes through the training data. Too many epochs can lead to overfitting on small datasets.
evaluation_steps 50 How often to evaluate on the validation set during training.
use_all_docs False If True, creates training pairs for all relevant documents per query (not just the first).

Concept: Warmup Steps

Warmup steps are automatically calculated as 10% of total training steps:

warmup_steps = int(len(data_loader) * epochs * 0.1)

During warmup, the learning rate gradually increases from zero to the target learning rate. This prevents destabilizing the pretrained weights with large initial gradient updates.

Concept: Validation and Evaluation

When a val_dataset is provided, an InformationRetrievalEvaluator is created to measure retrieval quality during training. This evaluator:

  • Uses the validation queries, corpus, and relevance judgments
  • Computes standard information retrieval metrics (e.g., MRR, NDCG, MAP)
  • Runs at intervals defined by evaluation_steps

This allows monitoring for overfitting and selecting the best checkpoint.

Concept: Checkpoint Management

For long-running finetuning jobs, checkpoint support enables:

  • save_checkpoints -- Enable/disable checkpoint saving
  • checkpoint_save_steps -- Save a checkpoint every N training steps
  • checkpoint_save_total_limit -- Maximum number of checkpoints to keep (0 = unlimited)
  • resume_from_checkpoint -- Resume training from the latest checkpoint

Knowledge Sources

LlamaIndex Embedding Finetuning Guide Sentence Transformers Training Overview MultipleNegativesRankingLoss

Metadata

Machine Learning Embeddings Finetuning Contrastive Learning LlamaIndex

Implementation:Run_llama_Llama_index_SentenceTransformersFinetuneEngine_Init Heuristic:Run_llama_Llama_index_Finetuning_Warmup_Steps

2026-02-11 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment