Implementation:Vibrantlabsai Ragas HuggingFaceEmbeddings
| Knowledge Sources | |
|---|---|
| Domains | Embeddings, HuggingFace, LLM Evaluation |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
HuggingFaceEmbeddings provides a unified embedding interface supporting both local sentence-transformers models and the HuggingFace Inference API for hosted models within the Ragas framework.
Description
The HuggingFaceEmbeddings class extends BaseRagasEmbedding and supports two distinct modes of operation controlled by the use_api parameter:
- Local mode (default): Uses the sentence-transformers library to load and run models directly on the local machine. Supports GPU acceleration via the device parameter and configurable normalize_embeddings behavior.
- API mode: Uses the huggingface-hub InferenceClient to call HuggingFace's hosted inference endpoints, suitable for models that are too large to run locally or when GPU resources are unavailable.
The class provides efficient batch processing through the embed_texts and aembed_texts methods. In local mode, batch embedding leverages sentence-transformers' native batch processing with a configurable batch_size (default 32). In API mode, texts are processed in batches using the feature_extraction endpoint.
Asynchronous operations are supported through run_sync_in_async utility, which runs synchronous embedding calls in a thread pool executor since the HuggingFace hub library does not provide native async support.
Usage
Use this class when you need HuggingFace-based embeddings for Ragas evaluations. Choose local mode for maximum throughput and privacy (no data leaves your machine), or API mode for convenience when you do not have local GPU resources. The class is particularly useful for evaluations that require open-source embedding models.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/embeddings/huggingface_provider.py
Signature
class HuggingFaceEmbeddings(BaseRagasEmbedding):
PROVIDER_NAME = "huggingface"
REQUIRES_MODEL = True
def __init__(
self,
model: str,
use_api: bool = False,
api_key: Optional[str] = None,
device: Optional[str] = None,
normalize_embeddings: bool = True,
batch_size: int = 32,
cache: Optional[CacheInterface] = None,
**model_kwargs: Any,
): ...
Import
from ragas.embeddings.huggingface_provider import HuggingFaceEmbeddings
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | str | Yes | The HuggingFace model name or path (e.g., "BAAI/bge-small-en-v1.5") |
| use_api | bool | No | If True, use HuggingFace Inference API instead of local model; defaults to False |
| api_key | Optional[str] | No | HuggingFace API token; required when use_api is True |
| device | Optional[str] | No | Device for local model inference (e.g., "cuda", "cpu"); only used in local mode |
| normalize_embeddings | bool | No | Whether to L2-normalize embeddings; defaults to True; only used in local mode |
| batch_size | int | No | Number of texts to process per batch; defaults to 32 |
| cache | Optional[CacheInterface] | No | Cache backend for storing and retrieving embedding results |
| **model_kwargs | Any | No | Additional keyword arguments passed to the SentenceTransformer constructor |
Outputs
embed_text / aembed_text
| Name | Type | Description |
|---|---|---|
| return | List[float] | A list of floats representing the embedding vector for a single text |
embed_texts / aembed_texts
| Name | Type | Description |
|---|---|---|
| return | List[List[float]] | A list of embedding vectors, one per input text |
Usage Examples
Local Model Usage
from ragas.embeddings.huggingface_provider import HuggingFaceEmbeddings
# Use a local sentence-transformers model
embeddings = HuggingFaceEmbeddings(
model="BAAI/bge-small-en-v1.5",
device="cuda",
normalize_embeddings=True,
)
# Embed a single text
vector = embeddings.embed_text("What is retrieval-augmented generation?")
print(len(vector)) # 384 for bge-small
# Embed multiple texts
vectors = embeddings.embed_texts([
"Machine learning is a subset of AI.",
"Deep learning uses neural networks.",
])
print(len(vectors)) # 2
API Mode Usage
from ragas.embeddings.huggingface_provider import HuggingFaceEmbeddings
# Use HuggingFace Inference API
embeddings = HuggingFaceEmbeddings(
model="BAAI/bge-small-en-v1.5",
use_api=True,
api_key="hf_your_token_here",
)
vector = embeddings.embed_text("What is RAG?")
With Caching
from ragas.cache import DiskCacheBackend
from ragas.embeddings.huggingface_provider import HuggingFaceEmbeddings
cache = DiskCacheBackend(cache_dir=".hf_embedding_cache")
embeddings = HuggingFaceEmbeddings(
model="BAAI/bge-small-en-v1.5",
cache=cache,
)
# First call computes; subsequent calls with same input use cache
vector = embeddings.embed_text("Cached embedding example")