Environment:Run llama Llama index Sentence Transformers Finetuning

Knowledge Sources	LlamaIndex Finetuning Sentence Transformers
Domains	Embedding_Finetuning, Deep_Learning
Last Updated	2026-02-11 19:00 GMT

Overview

Python 3.10+ environment with PyTorch, sentence-transformers, and llama-index-finetuning for embedding model finetuning and adapter training.

Description

This environment provides the deep learning stack required for local embedding finetuning. It includes sentence-transformers for Sentence Transformer model training, PyTorch for tensor operations and GPU acceleration, and the llama-index-finetuning package which wraps these libraries with LlamaIndex-compatible interfaces. The `EmbeddingAdapterFinetuneEngine` uses PyTorch's `AdamW` optimizer and supports CUDA, MPS, and CPU devices.

Usage

Use this environment for the Embedding Finetuning workflow: generating QA pairs, training Sentence Transformer models, and training embedding adapter layers. It is also required for cross-encoder finetuning.

System Requirements

Category	Requirement	Notes
OS	Linux or macOS	Windows via WSL
Python	>=3.10, <4.0	Higher requirement than core package
Hardware	CPU or NVIDIA GPU	GPU recommended for faster training
VRAM	4GB+ GPU memory	Depends on embedding model size
Disk	2GB+	For PyTorch, model weights, and checkpoints

Dependencies

Python Packages

`llama-index-finetuning` >= 0.4.1
`llama-index-core` >= 0.13.0, < 0.15
`sentence-transformers` >= 2.3.0
`torch` (pulled in by sentence-transformers)
`llama-index-embeddings-adapter` >= 0.4.0, < 0.5
`llama-index-llms-azure-openai` >= 0.4.0, < 0.5
`llama-index-llms-mistralai` >= 0.7.0, < 0.8
`mistralai` >= 1.7.0

Credentials

No special credentials required for local embedding finetuning. However:

`OPENAI_API_KEY`: Required if using OpenAI as the LLM for QA pair generation (`generate_qa_embedding_pairs`)

Quick Install

# Install the finetuning package (pulls in sentence-transformers and PyTorch)
pip install llama-index-finetuning>=0.4.1

# For GPU acceleration (install PyTorch with CUDA support first)
pip install torch --index-url https://download.pytorch.org/whl/cu118

Code Evidence

Python version requirement from `llama-index-finetuning/pyproject.toml:31`:

requires-python = ">=3.10,<4.0"

Sentence-transformers dependency from `llama-index-finetuning/pyproject.toml:41`:

"sentence-transformers>=2.3.0",

Device auto-detection from `embeddings/adapter.py:74-76`:

if device is None:
    device = infer_torch_device()
    logger.info(f"Use pytorch device: {device}")
self._target_device = torch.device(device)

Dimension auto-detection hack from `embeddings/adapter.py:62-65`:

# HACK: get dimension by passing text through it
if dim is None:
    test_embedding = self.embed_model.get_text_embedding("hello world")
    self.dim = len(test_embedding)

Common Errors

Error Message	Cause	Solution
`ModuleNotFoundError: No module named 'sentence_transformers'`	sentence-transformers not installed	`pip install sentence-transformers>=2.3.0`
`RuntimeError: CUDA out of memory`	Insufficient GPU VRAM	Reduce batch_size (default is 10) or use CPU
`ImportError: No module named 'torch'`	PyTorch not installed	`pip install torch`
Python version errors	Python < 3.10	Upgrade to Python 3.10+ (finetuning package requirement)

Compatibility Notes

CUDA: Auto-detected via `infer_torch_device()`. Falls back to MPS (Apple Silicon) then CPU.
Python Version: The finetuning package requires Python 3.10+, which is stricter than the core package's 3.9+ requirement.
Adapter vs Full Finetuning: The adapter engine trains a lightweight linear layer on top of frozen embeddings, requiring much less VRAM than full model finetuning.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment