Implementation:Vibrantlabsai Ragas HuggingFaceEmbeddings

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Embeddings, HuggingFace, LLM Evaluation
Last Updated	2026-02-12 00:00 GMT

Overview

HuggingFaceEmbeddings provides a unified embedding interface supporting both local sentence-transformers models and the HuggingFace Inference API for hosted models within the Ragas framework.

Description

The HuggingFaceEmbeddings class extends BaseRagasEmbedding and supports two distinct modes of operation controlled by the use_api parameter:

Local mode (default): Uses the sentence-transformers library to load and run models directly on the local machine. Supports GPU acceleration via the device parameter and configurable normalize_embeddings behavior.
API mode: Uses the huggingface-hub InferenceClient to call HuggingFace's hosted inference endpoints, suitable for models that are too large to run locally or when GPU resources are unavailable.

The class provides efficient batch processing through the embed_texts and aembed_texts methods. In local mode, batch embedding leverages sentence-transformers' native batch processing with a configurable batch_size (default 32). In API mode, texts are processed in batches using the feature_extraction endpoint.

Asynchronous operations are supported through run_sync_in_async utility, which runs synchronous embedding calls in a thread pool executor since the HuggingFace hub library does not provide native async support.

Usage

Use this class when you need HuggingFace-based embeddings for Ragas evaluations. Choose local mode for maximum throughput and privacy (no data leaves your machine), or API mode for convenience when you do not have local GPU resources. The class is particularly useful for evaluations that require open-source embedding models.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/embeddings/huggingface_provider.py

Signature

class HuggingFaceEmbeddings(BaseRagasEmbedding):
    PROVIDER_NAME = "huggingface"
    REQUIRES_MODEL = True

    def __init__(
        self,
        model: str,
        use_api: bool = False,
        api_key: Optional[str] = None,
        device: Optional[str] = None,
        normalize_embeddings: bool = True,
        batch_size: int = 32,
        cache: Optional[CacheInterface] = None,
        **model_kwargs: Any,
    ): ...

Import

from ragas.embeddings.huggingface_provider import HuggingFaceEmbeddings

I/O Contract

Inputs

Name	Type	Required	Description
model	str	Yes	The HuggingFace model name or path (e.g., "BAAI/bge-small-en-v1.5")
use_api	bool	No	If True, use HuggingFace Inference API instead of local model; defaults to False
api_key	Optional[str]	No	HuggingFace API token; required when use_api is True
device	Optional[str]	No	Device for local model inference (e.g., "cuda", "cpu"); only used in local mode
normalize_embeddings	bool	No	Whether to L2-normalize embeddings; defaults to True; only used in local mode
batch_size	int	No	Number of texts to process per batch; defaults to 32
cache	Optional[CacheInterface]	No	Cache backend for storing and retrieving embedding results
**model_kwargs	Any	No	Additional keyword arguments passed to the SentenceTransformer constructor

Outputs

embed_text / aembed_text

Name	Type	Description
return	List[float]	A list of floats representing the embedding vector for a single text

embed_texts / aembed_texts

Name	Type	Description
return	List[List[float]]	A list of embedding vectors, one per input text

Usage Examples

Local Model Usage

from ragas.embeddings.huggingface_provider import HuggingFaceEmbeddings

# Use a local sentence-transformers model
embeddings = HuggingFaceEmbeddings(
    model="BAAI/bge-small-en-v1.5",
    device="cuda",
    normalize_embeddings=True,
)

# Embed a single text
vector = embeddings.embed_text("What is retrieval-augmented generation?")
print(len(vector))  # 384 for bge-small

# Embed multiple texts
vectors = embeddings.embed_texts([
    "Machine learning is a subset of AI.",
    "Deep learning uses neural networks.",
])
print(len(vectors))  # 2

API Mode Usage

from ragas.embeddings.huggingface_provider import HuggingFaceEmbeddings

# Use HuggingFace Inference API
embeddings = HuggingFaceEmbeddings(
    model="BAAI/bge-small-en-v1.5",
    use_api=True,
    api_key="hf_your_token_here",
)

vector = embeddings.embed_text("What is RAG?")

With Caching

from ragas.cache import DiskCacheBackend
from ragas.embeddings.huggingface_provider import HuggingFaceEmbeddings

cache = DiskCacheBackend(cache_dir=".hf_embedding_cache")
embeddings = HuggingFaceEmbeddings(
    model="BAAI/bge-small-en-v1.5",
    cache=cache,
)

# First call computes; subsequent calls with same input use cache
vector = embeddings.embed_text("Cached embedding example")

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment