Implementation:Unslothai Unsloth FastSentenceTransformer
| Knowledge Sources | |
|---|---|
| Domains | NLP, Embeddings, Training |
| Last Updated | 2026-02-07 08:40 GMT |
Overview
Concrete tool for loading, fine-tuning, and saving sentence transformer/embedding models with Unsloth optimizations including LoRA, torch.compile, and quantization.
Description
The FastSentenceTransformer class extends Unsloth's FastModel to support sentence transformer models. It wraps the sentence-transformers library by loading the inner transformer via Unsloth's optimized loader, reconstructing the SentenceTransformer pipeline with auto-detected pooling configuration. It patches encoder models (MPNet, DistilBERT, BERT) for gradient checkpointing support, applies torch.compile for fast encoder models, and provides custom save_pretrained_torchao and save_pretrained_gguf methods for quantized export.
Usage
Import this class when you need to fine-tune sentence embedding models with LoRA/QLoRA using Unsloth optimizations and export in GGUF or TorchAO formats.
Code Reference
Source Location
- Repository: Unslothai_Unsloth
- File: unsloth/models/sentence_transformer.py
- Lines: 1-1853
Signature
class FastSentenceTransformer:
@staticmethod
def from_pretrained(
model_name: str,
max_seq_length: int = None,
dtype=None,
load_in_4bit: bool = False,
load_in_8bit: bool = False,
load_in_16bit: bool = True,
full_finetuning: bool = False,
token: str = None,
device_map: str = "sequential",
rope_scaling=None,
fix_tokenizer: bool = True,
trust_remote_code: bool = False,
use_gradient_checkpointing: bool = False,
resize_model_vocab=None,
revision=None,
use_exact_model_name: bool = False,
offload_embedding: bool = False,
random_state: int = 3407,
max_lora_rank: int = 64,
disable_log_stats: bool = True,
qat_scheme=None,
unsloth_tiled_mlp: bool = False,
pooling_mode: str = "mean",
for_inference: bool = False,
**kwargs,
) -> "SentenceTransformer":
"""Load and optimize a SentenceTransformer model."""
@staticmethod
def get_peft_model(
model,
r: int = 16,
target_modules: list = ["query", "key", "value", "dense"],
lora_alpha: int = 16,
lora_dropout: float = 0.0,
bias: str = "none",
layers_to_transform=None,
layers_pattern=None,
use_gradient_checkpointing: bool = False,
random_state: int = 3407,
max_seq_length: int = 2048,
use_rslora: bool = False,
modules_to_save=None,
init_lora_weights=True,
loftq_config: dict = {},
**kwargs,
):
"""Apply LoRA to SentenceTransformer model."""
Import
from unsloth.models.sentence_transformer import FastSentenceTransformer
I/O Contract
Inputs (from_pretrained)
| Name | Type | Required | Description |
|---|---|---|---|
| model_name | str | Yes | HuggingFace model identifier or local path |
| max_seq_length | int | No | Maximum sequence length (auto-detected if None) |
| load_in_4bit | bool | No | Enable 4-bit quantization (default: False) |
| load_in_8bit | bool | No | Enable 8-bit quantization (default: False) |
| pooling_mode | str | No | Pooling strategy: mean, cls, max (default: mean) |
| for_inference | bool | No | Skip Unsloth optimizations (default: False) |
Inputs (get_peft_model)
| Name | Type | Required | Description |
|---|---|---|---|
| model | SentenceTransformer | Yes | Loaded sentence transformer model |
| r | int | No | LoRA rank (default: 16) |
| target_modules | list | No | Layers to apply LoRA (default: query, key, value, dense) |
| lora_alpha | int | No | LoRA scaling parameter (default: 16) |
| use_gradient_checkpointing | bool | No | Enable gradient checkpointing (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| from_pretrained returns | SentenceTransformer | Optimized sentence transformer model |
| get_peft_model returns | SentenceTransformer | Model with LoRA adapters applied |
Usage Examples
Fine-tune Sentence Embedding Model
from unsloth.models.sentence_transformer import FastSentenceTransformer
# 1. Load model with 4-bit quantization
model = FastSentenceTransformer.from_pretrained(
"BAAI/bge-base-en-v1.5",
load_in_4bit=True,
max_seq_length=512,
)
# 2. Apply LoRA
model = FastSentenceTransformer.get_peft_model(
model,
r=16,
target_modules=["query", "key", "value", "dense"],
lora_alpha=16,
)
# 3. Train with sentence-transformers trainer
from sentence_transformers import SentenceTransformerTrainer, SentenceTransformerTrainingArguments
args = SentenceTransformerTrainingArguments(
output_dir="./output",
num_train_epochs=3,
per_device_train_batch_size=16,
)
trainer = SentenceTransformerTrainer(
model=model,
args=args,
train_dataset=train_dataset,
)
trainer.train()
GGUF Export
# Save in GGUF format
model.save_pretrained_gguf(
"./gguf_output",
quantization_method="q4_k_m",
)