Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Run llama Llama index SentenceTransformersFinetuneEngine Finetune

From Leeroopedia
Revision as of 11:48, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Run_llama_Llama_index_SentenceTransformersFinetuneEngine_Finetune.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

The finetune method of SentenceTransformersFinetuneEngine executes the actual embedding model training. It delegates to the Sentence Transformers model.fit() method, passing all the training configuration that was set up during initialization, including train objectives, evaluator, epochs, warmup steps, output path, and checkpoint settings.

Source Location

Property Value
File llama-index-finetuning/llama_index/finetuning/embeddings/sentence_transformer.py
Lines 90-104
Class SentenceTransformersFinetuneEngine
Method finetune(**train_kwargs) -> None
Invocation engine.finetune()

Method Signature

def finetune(self, **train_kwargs: Any) -> None:
    """Finetune model."""
    self.model.fit(
        train_objectives=[(self.loader, self.loss)],
        epochs=self.epochs,
        warmup_steps=self.warmup_steps,
        output_path=self.model_output_path,
        show_progress_bar=self.show_progress_bar,
        evaluator=self.evaluator,
        evaluation_steps=self.evaluation_steps,
        checkpoint_path=self.checkpoint_path,
        resume_from_checkpoint=self.resume_from_checkpoint,
        checkpoint_save_steps=self.checkpoint_save_steps,
        checkpoint_save_total_limit=self.checkpoint_save_total_limit,
    )

Parameters

Parameter Type Description
**train_kwargs Any Additional keyword arguments. Note: these are accepted by the method signature but are not passed through to model.fit() in the current implementation.

Return Value

Returns None. The finetuned model is saved to self.model_output_path on disk as a side effect.

Arguments Passed to model.fit()

All arguments are sourced from instance attributes set during __init__:

Argument Source Description
train_objectives [(self.loader, self.loss)] List of (DataLoader, loss) tuples defining the training task
epochs self.epochs Number of training epochs (default: 2)
warmup_steps self.warmup_steps Steps for learning rate warmup (auto-calculated as 10% of total steps)
output_path self.model_output_path Directory to save the final finetuned model
show_progress_bar self.show_progress_bar Whether to display training progress
evaluator self.evaluator InformationRetrievalEvaluator instance, or None if no validation dataset
evaluation_steps self.evaluation_steps How often to run evaluation (default: 50 steps)
checkpoint_path self.checkpoint_path Directory for checkpoints, or None if disabled
resume_from_checkpoint self.resume_from_checkpoint Whether to resume from latest checkpoint
checkpoint_save_steps self.checkpoint_save_steps Save checkpoint every N steps
checkpoint_save_total_limit self.checkpoint_save_total_limit Max checkpoints to keep (0 = unlimited)

Internal Behavior

The model.fit() method from Sentence Transformers performs:

  1. Sets up the optimizer with linear warmup and decay scheduling
  2. Iterates through epochs, processing batches from the DataLoader
  3. For each batch:
    • Encodes the query and document texts through the Sentence Transformer
    • Computes the MultipleNegativesRankingLoss (or custom loss)
    • Performs backpropagation and parameter update
  4. At evaluation intervals (if evaluator is set):
    • Runs the InformationRetrievalEvaluator on validation data
    • Logs retrieval metrics (MRR, NDCG, MAP)
  5. At checkpoint intervals (if checkpointing enabled):
    • Saves model state to the checkpoint directory
    • Cleans up old checkpoints if total limit is set
  6. After all epochs:
    • Saves the final model to output_path

Side Effects

  • Writes the finetuned model to self.model_output_path
  • If checkpointing is enabled, writes checkpoint files to checkpoints/{model_output_path}/
  • Prints progress bar and evaluation metrics to stdout

Usage Example

from llama_index.finetuning import (
    SentenceTransformersFinetuneEngine,
    EmbeddingQAFinetuneDataset,
)

# Load dataset and configure engine
train_dataset = EmbeddingQAFinetuneDataset.from_json("train_dataset.json")
val_dataset = EmbeddingQAFinetuneDataset.from_json("val_dataset.json")

finetune_engine = SentenceTransformersFinetuneEngine(
    dataset=train_dataset,
    model_id="BAAI/bge-small-en",
    model_output_path="finetuned_model",
    val_dataset=val_dataset,
    epochs=2,
)

# Execute finetuning
finetune_engine.finetune()
# Model is now saved to "finetuned_model/" directory

Dependencies

  • sentence_transformers.SentenceTransformer.fit -- Core training method from Sentence Transformers library
  • All instance attributes set during __init__ (loader, loss, evaluator, etc.)

Knowledge Sources

LlamaIndex Finetuning Source SentenceTransformer.fit API

Metadata

Machine Learning Embeddings Finetuning LlamaIndex

Principle:Run_llama_Llama_index_Embedding_Finetune_Execution

2026-02-11 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment