Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Run llama Llama index Sentence Transformers Finetuning

From Leeroopedia
Revision as of 18:38, 16 February 2026 by Admin (talk | contribs) (Auto-imported from environments/Run_llama_Llama_index_Sentence_Transformers_Finetuning.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Embedding_Finetuning, Deep_Learning
Last Updated 2026-02-11 19:00 GMT

Overview

Python 3.10+ environment with PyTorch, sentence-transformers, and llama-index-finetuning for embedding model finetuning and adapter training.

Description

This environment provides the deep learning stack required for local embedding finetuning. It includes sentence-transformers for Sentence Transformer model training, PyTorch for tensor operations and GPU acceleration, and the llama-index-finetuning package which wraps these libraries with LlamaIndex-compatible interfaces. The `EmbeddingAdapterFinetuneEngine` uses PyTorch's `AdamW` optimizer and supports CUDA, MPS, and CPU devices.

Usage

Use this environment for the Embedding Finetuning workflow: generating QA pairs, training Sentence Transformer models, and training embedding adapter layers. It is also required for cross-encoder finetuning.

System Requirements

Category Requirement Notes
OS Linux or macOS Windows via WSL
Python >=3.10, <4.0 Higher requirement than core package
Hardware CPU or NVIDIA GPU GPU recommended for faster training
VRAM 4GB+ GPU memory Depends on embedding model size
Disk 2GB+ For PyTorch, model weights, and checkpoints

Dependencies

Python Packages

  • `llama-index-finetuning` >= 0.4.1
  • `llama-index-core` >= 0.13.0, < 0.15
  • `sentence-transformers` >= 2.3.0
  • `torch` (pulled in by sentence-transformers)
  • `llama-index-embeddings-adapter` >= 0.4.0, < 0.5
  • `llama-index-llms-azure-openai` >= 0.4.0, < 0.5
  • `llama-index-llms-mistralai` >= 0.7.0, < 0.8
  • `mistralai` >= 1.7.0

Credentials

No special credentials required for local embedding finetuning. However:

  • `OPENAI_API_KEY`: Required if using OpenAI as the LLM for QA pair generation (`generate_qa_embedding_pairs`)

Quick Install

# Install the finetuning package (pulls in sentence-transformers and PyTorch)
pip install llama-index-finetuning>=0.4.1

# For GPU acceleration (install PyTorch with CUDA support first)
pip install torch --index-url https://download.pytorch.org/whl/cu118

Code Evidence

Python version requirement from `llama-index-finetuning/pyproject.toml:31`:

requires-python = ">=3.10,<4.0"

Sentence-transformers dependency from `llama-index-finetuning/pyproject.toml:41`:

"sentence-transformers>=2.3.0",

Device auto-detection from `embeddings/adapter.py:74-76`:

if device is None:
    device = infer_torch_device()
    logger.info(f"Use pytorch device: {device}")
self._target_device = torch.device(device)

Dimension auto-detection hack from `embeddings/adapter.py:62-65`:

# HACK: get dimension by passing text through it
if dim is None:
    test_embedding = self.embed_model.get_text_embedding("hello world")
    self.dim = len(test_embedding)

Common Errors

Error Message Cause Solution
`ModuleNotFoundError: No module named 'sentence_transformers'` sentence-transformers not installed `pip install sentence-transformers>=2.3.0`
`RuntimeError: CUDA out of memory` Insufficient GPU VRAM Reduce batch_size (default is 10) or use CPU
`ImportError: No module named 'torch'` PyTorch not installed `pip install torch`
Python version errors Python < 3.10 Upgrade to Python 3.10+ (finetuning package requirement)

Compatibility Notes

  • CUDA: Auto-detected via `infer_torch_device()`. Falls back to MPS (Apple Silicon) then CPU.
  • Python Version: The finetuning package requires Python 3.10+, which is stricter than the core package's 3.9+ requirement.
  • Adapter vs Full Finetuning: The adapter engine trains a lightweight linear layer on top of frozen embeddings, requiring much less VRAM than full model finetuning.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment