Implementation:Run llama Llama index SentenceTransformersFinetuneEngine Finetune

Overview

The finetune method of SentenceTransformersFinetuneEngine executes the actual embedding model training. It delegates to the Sentence Transformers model.fit() method, passing all the training configuration that was set up during initialization, including train objectives, evaluator, epochs, warmup steps, output path, and checkpoint settings.

Source Location

Property	Value
File	`llama-index-finetuning/llama_index/finetuning/embeddings/sentence_transformer.py`
Lines	90-104
Class	`SentenceTransformersFinetuneEngine`
Method	`finetune(**train_kwargs) -> None`
Invocation	`engine.finetune()`

Method Signature

def finetune(self, **train_kwargs: Any) -> None:
    """Finetune model."""
    self.model.fit(
        train_objectives=[(self.loader, self.loss)],
        epochs=self.epochs,
        warmup_steps=self.warmup_steps,
        output_path=self.model_output_path,
        show_progress_bar=self.show_progress_bar,
        evaluator=self.evaluator,
        evaluation_steps=self.evaluation_steps,
        checkpoint_path=self.checkpoint_path,
        resume_from_checkpoint=self.resume_from_checkpoint,
        checkpoint_save_steps=self.checkpoint_save_steps,
        checkpoint_save_total_limit=self.checkpoint_save_total_limit,
    )

Parameters

Parameter	Type	Description
**train_kwargs	`Any`	Additional keyword arguments. Note: these are accepted by the method signature but are not passed through to `model.fit()` in the current implementation.

Return Value

Returns None. The finetuned model is saved to self.model_output_path on disk as a side effect.

Arguments Passed to model.fit()

All arguments are sourced from instance attributes set during __init__:

Argument	Source	Description
train_objectives	`[(self.loader, self.loss)]`	List of (DataLoader, loss) tuples defining the training task
epochs	`self.epochs`	Number of training epochs (default: 2)
warmup_steps	`self.warmup_steps`	Steps for learning rate warmup (auto-calculated as 10% of total steps)
output_path	`self.model_output_path`	Directory to save the final finetuned model
show_progress_bar	`self.show_progress_bar`	Whether to display training progress
evaluator	`self.evaluator`	InformationRetrievalEvaluator instance, or None if no validation dataset
evaluation_steps	`self.evaluation_steps`	How often to run evaluation (default: 50 steps)
checkpoint_path	`self.checkpoint_path`	Directory for checkpoints, or None if disabled
resume_from_checkpoint	`self.resume_from_checkpoint`	Whether to resume from latest checkpoint
checkpoint_save_steps	`self.checkpoint_save_steps`	Save checkpoint every N steps
checkpoint_save_total_limit	`self.checkpoint_save_total_limit`	Max checkpoints to keep (0 = unlimited)

Internal Behavior

The model.fit() method from Sentence Transformers performs:

Sets up the optimizer with linear warmup and decay scheduling
Iterates through epochs, processing batches from the DataLoader
For each batch:
- Encodes the query and document texts through the Sentence Transformer
- Computes the MultipleNegativesRankingLoss (or custom loss)
- Performs backpropagation and parameter update
At evaluation intervals (if evaluator is set):
- Runs the InformationRetrievalEvaluator on validation data
- Logs retrieval metrics (MRR, NDCG, MAP)
At checkpoint intervals (if checkpointing enabled):
- Saves model state to the checkpoint directory
- Cleans up old checkpoints if total limit is set
After all epochs:
- Saves the final model to output_path

Side Effects

Writes the finetuned model to self.model_output_path
If checkpointing is enabled, writes checkpoint files to checkpoints/{model_output_path}/
Prints progress bar and evaluation metrics to stdout

Usage Example

from llama_index.finetuning import (
    SentenceTransformersFinetuneEngine,
    EmbeddingQAFinetuneDataset,
)

# Load dataset and configure engine
train_dataset = EmbeddingQAFinetuneDataset.from_json("train_dataset.json")
val_dataset = EmbeddingQAFinetuneDataset.from_json("val_dataset.json")

finetune_engine = SentenceTransformersFinetuneEngine(
    dataset=train_dataset,
    model_id="BAAI/bge-small-en",
    model_output_path="finetuned_model",
    val_dataset=val_dataset,
    epochs=2,
)

# Execute finetuning
finetune_engine.finetune()
# Model is now saved to "finetuned_model/" directory

Dependencies

sentence_transformers.SentenceTransformer.fit -- Core training method from Sentence Transformers library
All instance attributes set during __init__ (loader, loss, evaluator, etc.)

Knowledge Sources

LlamaIndex Finetuning Source SentenceTransformer.fit API

Metadata

Machine Learning Embeddings Finetuning LlamaIndex

Principle:Run_llama_Llama_index_Embedding_Finetune_Execution

2026-02-11 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment