Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas HuggingFaceEmbeddings

From Leeroopedia
Knowledge Sources
Domains Embeddings, HuggingFace, LLM Evaluation
Last Updated 2026-02-12 00:00 GMT

Overview

HuggingFaceEmbeddings provides a unified embedding interface supporting both local sentence-transformers models and the HuggingFace Inference API for hosted models within the Ragas framework.

Description

The HuggingFaceEmbeddings class extends BaseRagasEmbedding and supports two distinct modes of operation controlled by the use_api parameter:

  • Local mode (default): Uses the sentence-transformers library to load and run models directly on the local machine. Supports GPU acceleration via the device parameter and configurable normalize_embeddings behavior.
  • API mode: Uses the huggingface-hub InferenceClient to call HuggingFace's hosted inference endpoints, suitable for models that are too large to run locally or when GPU resources are unavailable.

The class provides efficient batch processing through the embed_texts and aembed_texts methods. In local mode, batch embedding leverages sentence-transformers' native batch processing with a configurable batch_size (default 32). In API mode, texts are processed in batches using the feature_extraction endpoint.

Asynchronous operations are supported through run_sync_in_async utility, which runs synchronous embedding calls in a thread pool executor since the HuggingFace hub library does not provide native async support.

Usage

Use this class when you need HuggingFace-based embeddings for Ragas evaluations. Choose local mode for maximum throughput and privacy (no data leaves your machine), or API mode for convenience when you do not have local GPU resources. The class is particularly useful for evaluations that require open-source embedding models.

Code Reference

Source Location

Signature

class HuggingFaceEmbeddings(BaseRagasEmbedding):
    PROVIDER_NAME = "huggingface"
    REQUIRES_MODEL = True

    def __init__(
        self,
        model: str,
        use_api: bool = False,
        api_key: Optional[str] = None,
        device: Optional[str] = None,
        normalize_embeddings: bool = True,
        batch_size: int = 32,
        cache: Optional[CacheInterface] = None,
        **model_kwargs: Any,
    ): ...

Import

from ragas.embeddings.huggingface_provider import HuggingFaceEmbeddings

I/O Contract

Inputs

Name Type Required Description
model str Yes The HuggingFace model name or path (e.g., "BAAI/bge-small-en-v1.5")
use_api bool No If True, use HuggingFace Inference API instead of local model; defaults to False
api_key Optional[str] No HuggingFace API token; required when use_api is True
device Optional[str] No Device for local model inference (e.g., "cuda", "cpu"); only used in local mode
normalize_embeddings bool No Whether to L2-normalize embeddings; defaults to True; only used in local mode
batch_size int No Number of texts to process per batch; defaults to 32
cache Optional[CacheInterface] No Cache backend for storing and retrieving embedding results
**model_kwargs Any No Additional keyword arguments passed to the SentenceTransformer constructor

Outputs

embed_text / aembed_text

Name Type Description
return List[float] A list of floats representing the embedding vector for a single text

embed_texts / aembed_texts

Name Type Description
return List[List[float]] A list of embedding vectors, one per input text

Usage Examples

Local Model Usage

from ragas.embeddings.huggingface_provider import HuggingFaceEmbeddings

# Use a local sentence-transformers model
embeddings = HuggingFaceEmbeddings(
    model="BAAI/bge-small-en-v1.5",
    device="cuda",
    normalize_embeddings=True,
)

# Embed a single text
vector = embeddings.embed_text("What is retrieval-augmented generation?")
print(len(vector))  # 384 for bge-small

# Embed multiple texts
vectors = embeddings.embed_texts([
    "Machine learning is a subset of AI.",
    "Deep learning uses neural networks.",
])
print(len(vectors))  # 2

API Mode Usage

from ragas.embeddings.huggingface_provider import HuggingFaceEmbeddings

# Use HuggingFace Inference API
embeddings = HuggingFaceEmbeddings(
    model="BAAI/bge-small-en-v1.5",
    use_api=True,
    api_key="hf_your_token_here",
)

vector = embeddings.embed_text("What is RAG?")

With Caching

from ragas.cache import DiskCacheBackend
from ragas.embeddings.huggingface_provider import HuggingFaceEmbeddings

cache = DiskCacheBackend(cache_dir=".hf_embedding_cache")
embeddings = HuggingFaceEmbeddings(
    model="BAAI/bge-small-en-v1.5",
    cache=cache,
)

# First call computes; subsequent calls with same input use cache
vector = embeddings.embed_text("Cached embedding example")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment