Implementation:Run llama Llama index SentenceTransformersFinetuneEngine Finetune
Overview
The finetune method of SentenceTransformersFinetuneEngine executes the actual embedding model training. It delegates to the Sentence Transformers model.fit() method, passing all the training configuration that was set up during initialization, including train objectives, evaluator, epochs, warmup steps, output path, and checkpoint settings.
Source Location
| Property | Value |
|---|---|
| File | llama-index-finetuning/llama_index/finetuning/embeddings/sentence_transformer.py
|
| Lines | 90-104 |
| Class | SentenceTransformersFinetuneEngine
|
| Method | finetune(**train_kwargs) -> None
|
| Invocation | engine.finetune()
|
Method Signature
def finetune(self, **train_kwargs: Any) -> None:
"""Finetune model."""
self.model.fit(
train_objectives=[(self.loader, self.loss)],
epochs=self.epochs,
warmup_steps=self.warmup_steps,
output_path=self.model_output_path,
show_progress_bar=self.show_progress_bar,
evaluator=self.evaluator,
evaluation_steps=self.evaluation_steps,
checkpoint_path=self.checkpoint_path,
resume_from_checkpoint=self.resume_from_checkpoint,
checkpoint_save_steps=self.checkpoint_save_steps,
checkpoint_save_total_limit=self.checkpoint_save_total_limit,
)
Parameters
| Parameter | Type | Description |
|---|---|---|
| **train_kwargs | Any |
Additional keyword arguments. Note: these are accepted by the method signature but are not passed through to model.fit() in the current implementation.
|
Return Value
Returns None. The finetuned model is saved to self.model_output_path on disk as a side effect.
Arguments Passed to model.fit()
All arguments are sourced from instance attributes set during __init__:
| Argument | Source | Description |
|---|---|---|
| train_objectives | [(self.loader, self.loss)] |
List of (DataLoader, loss) tuples defining the training task |
| epochs | self.epochs |
Number of training epochs (default: 2) |
| warmup_steps | self.warmup_steps |
Steps for learning rate warmup (auto-calculated as 10% of total steps) |
| output_path | self.model_output_path |
Directory to save the final finetuned model |
| show_progress_bar | self.show_progress_bar |
Whether to display training progress |
| evaluator | self.evaluator |
InformationRetrievalEvaluator instance, or None if no validation dataset |
| evaluation_steps | self.evaluation_steps |
How often to run evaluation (default: 50 steps) |
| checkpoint_path | self.checkpoint_path |
Directory for checkpoints, or None if disabled |
| resume_from_checkpoint | self.resume_from_checkpoint |
Whether to resume from latest checkpoint |
| checkpoint_save_steps | self.checkpoint_save_steps |
Save checkpoint every N steps |
| checkpoint_save_total_limit | self.checkpoint_save_total_limit |
Max checkpoints to keep (0 = unlimited) |
Internal Behavior
The model.fit() method from Sentence Transformers performs:
- Sets up the optimizer with linear warmup and decay scheduling
- Iterates through epochs, processing batches from the DataLoader
- For each batch:
- Encodes the query and document texts through the Sentence Transformer
- Computes the MultipleNegativesRankingLoss (or custom loss)
- Performs backpropagation and parameter update
- At evaluation intervals (if evaluator is set):
- Runs the InformationRetrievalEvaluator on validation data
- Logs retrieval metrics (MRR, NDCG, MAP)
- At checkpoint intervals (if checkpointing enabled):
- Saves model state to the checkpoint directory
- Cleans up old checkpoints if total limit is set
- After all epochs:
- Saves the final model to
output_path
- Saves the final model to
Side Effects
- Writes the finetuned model to
self.model_output_path - If checkpointing is enabled, writes checkpoint files to
checkpoints/{model_output_path}/ - Prints progress bar and evaluation metrics to stdout
Usage Example
from llama_index.finetuning import (
SentenceTransformersFinetuneEngine,
EmbeddingQAFinetuneDataset,
)
# Load dataset and configure engine
train_dataset = EmbeddingQAFinetuneDataset.from_json("train_dataset.json")
val_dataset = EmbeddingQAFinetuneDataset.from_json("val_dataset.json")
finetune_engine = SentenceTransformersFinetuneEngine(
dataset=train_dataset,
model_id="BAAI/bge-small-en",
model_output_path="finetuned_model",
val_dataset=val_dataset,
epochs=2,
)
# Execute finetuning
finetune_engine.finetune()
# Model is now saved to "finetuned_model/" directory
Dependencies
sentence_transformers.SentenceTransformer.fit-- Core training method from Sentence Transformers library- All instance attributes set during
__init__(loader, loss, evaluator, etc.)
Knowledge Sources
LlamaIndex Finetuning Source SentenceTransformer.fit API
Metadata
Machine Learning Embeddings Finetuning LlamaIndex
Principle:Run_llama_Llama_index_Embedding_Finetune_Execution