Environment:Run llama Llama index Sentence Transformers Finetuning
| Knowledge Sources | |
|---|---|
| Domains | Embedding_Finetuning, Deep_Learning |
| Last Updated | 2026-02-11 19:00 GMT |
Overview
Python 3.10+ environment with PyTorch, sentence-transformers, and llama-index-finetuning for embedding model finetuning and adapter training.
Description
This environment provides the deep learning stack required for local embedding finetuning. It includes sentence-transformers for Sentence Transformer model training, PyTorch for tensor operations and GPU acceleration, and the llama-index-finetuning package which wraps these libraries with LlamaIndex-compatible interfaces. The `EmbeddingAdapterFinetuneEngine` uses PyTorch's `AdamW` optimizer and supports CUDA, MPS, and CPU devices.
Usage
Use this environment for the Embedding Finetuning workflow: generating QA pairs, training Sentence Transformer models, and training embedding adapter layers. It is also required for cross-encoder finetuning.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux or macOS | Windows via WSL |
| Python | >=3.10, <4.0 | Higher requirement than core package |
| Hardware | CPU or NVIDIA GPU | GPU recommended for faster training |
| VRAM | 4GB+ GPU memory | Depends on embedding model size |
| Disk | 2GB+ | For PyTorch, model weights, and checkpoints |
Dependencies
Python Packages
- `llama-index-finetuning` >= 0.4.1
- `llama-index-core` >= 0.13.0, < 0.15
- `sentence-transformers` >= 2.3.0
- `torch` (pulled in by sentence-transformers)
- `llama-index-embeddings-adapter` >= 0.4.0, < 0.5
- `llama-index-llms-azure-openai` >= 0.4.0, < 0.5
- `llama-index-llms-mistralai` >= 0.7.0, < 0.8
- `mistralai` >= 1.7.0
Credentials
No special credentials required for local embedding finetuning. However:
- `OPENAI_API_KEY`: Required if using OpenAI as the LLM for QA pair generation (`generate_qa_embedding_pairs`)
Quick Install
# Install the finetuning package (pulls in sentence-transformers and PyTorch)
pip install llama-index-finetuning>=0.4.1
# For GPU acceleration (install PyTorch with CUDA support first)
pip install torch --index-url https://download.pytorch.org/whl/cu118
Code Evidence
Python version requirement from `llama-index-finetuning/pyproject.toml:31`:
requires-python = ">=3.10,<4.0"
Sentence-transformers dependency from `llama-index-finetuning/pyproject.toml:41`:
"sentence-transformers>=2.3.0",
Device auto-detection from `embeddings/adapter.py:74-76`:
if device is None:
device = infer_torch_device()
logger.info(f"Use pytorch device: {device}")
self._target_device = torch.device(device)
Dimension auto-detection hack from `embeddings/adapter.py:62-65`:
# HACK: get dimension by passing text through it
if dim is None:
test_embedding = self.embed_model.get_text_embedding("hello world")
self.dim = len(test_embedding)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ModuleNotFoundError: No module named 'sentence_transformers'` | sentence-transformers not installed | `pip install sentence-transformers>=2.3.0` |
| `RuntimeError: CUDA out of memory` | Insufficient GPU VRAM | Reduce batch_size (default is 10) or use CPU |
| `ImportError: No module named 'torch'` | PyTorch not installed | `pip install torch` |
| Python version errors | Python < 3.10 | Upgrade to Python 3.10+ (finetuning package requirement) |
Compatibility Notes
- CUDA: Auto-detected via `infer_torch_device()`. Falls back to MPS (Apple Silicon) then CPU.
- Python Version: The finetuning package requires Python 3.10+, which is stricter than the core package's 3.9+ requirement.
- Adapter vs Full Finetuning: The adapter engine trains a lightweight linear layer on top of frozen embeddings, requiring much less VRAM than full model finetuning.
Related Pages
- Implementation:Run_llama_Llama_index_SentenceTransformersFinetuneEngine_Init
- Implementation:Run_llama_Llama_index_SentenceTransformersFinetuneEngine_Finetune
- Implementation:Run_llama_Llama_index_EmbeddingFinetuneEngine_Get_Model
- Implementation:Run_llama_Llama_index_Generate_QA_Embedding_Pairs
- Implementation:Run_llama_Llama_index_Settings_Embed_Model_Assignment