Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:CrewAIInc CrewAI RAG Embedding Service

From Leeroopedia
Knowledge Sources
Domains RAG, Embeddings, AI_ML
Last Updated 2026-02-11 00:00 GMT

Overview

Unified embedding service that supports 18+ embedding providers through CrewAI's embedding factory, providing the vector representation layer for the RAG system.

Description

This module implements two classes:

EmbeddingConfig is a Pydantic BaseModel that holds configuration for embedding providers:

  • provider: Embedding provider name (e.g., "openai", "cohere", "ollama")
  • model: Model name (e.g., "text-embedding-3-small")
  • api_key: Optional API key (auto-resolved from environment variables)
  • timeout: Request timeout in seconds (default: 30.0)
  • max_retries: Maximum retries (default: 3)
  • batch_size: Batch size for processing multiple texts (default: 100)
  • extra_config: Additional provider-specific configuration

EmbeddingService is the main class that wraps CrewAI's embedding providers with a consistent API. It:

  • Uses CrewAI's build_embedder() factory to create provider-specific embedding functions
  • Maps provider names to environment variable names for automatic API key resolution
  • Builds provider-specific configuration dictionaries with the correct parameter naming conventions
  • Provides embed_text() for single text embedding and embed_batch() for bulk processing with automatic batching
  • Includes utility methods: get_embedding_dimension(), validate_connection(), get_service_info()
  • Offers 12 convenience factory methods for common providers: create_openai_service(), create_voyage_service(), create_cohere_service(), create_gemini_service(), create_azure_service(), create_bedrock_service(), create_huggingface_service(), create_sentence_transformer_service(), create_ollama_service(), create_jina_service(), create_instructor_service(), create_watsonx_service(), and create_custom_service()

Supported providers: openai, azure, amazon-bedrock, cohere, google-generativeai, google-vertex, huggingface, jina, ollama, sentence-transformer, instructor, onnx, roboflow, openclip, text2vec, voyageai, watsonx, custom

Usage

Use this service when you need to generate vector embeddings for text in the RAG pipeline. Choose from cloud APIs (OpenAI, Cohere), local models (Ollama, sentence-transformers), or custom implementations without changing pipeline code.

Code Reference

Source Location

  • Repository: CrewAI
  • File: lib/crewai-tools/src/crewai_tools/rag/embedding_service.py
  • Lines: 1-511

Signature (EmbeddingService)

class EmbeddingService:
    """Enhanced embedding service that uses CrewAI's existing embedding providers."""

    def __init__(
        self,
        provider: str = "openai",
        model: str = "text-embedding-3-small",
        api_key: str | None = None,
        **kwargs: Any,
    ):

Signature (EmbeddingConfig)

class EmbeddingConfig(BaseModel):
    """Configuration for embedding providers."""

    provider: str = Field(description="Embedding provider name")
    model: str = Field(description="Model name to use")
    api_key: str | None = Field(default=None, description="API key for the provider")
    timeout: float | None = Field(default=30.0, description="Request timeout in seconds")
    max_retries: int = Field(default=3, description="Maximum number of retries")
    batch_size: int = Field(default=100, description="Batch size for processing multiple texts")
    extra_config: dict[str, Any] = Field(default_factory=dict, description="Additional provider-specific configuration")

Import

from crewai_tools.rag.embedding_service import EmbeddingService, EmbeddingConfig

I/O Contract

Inputs (Constructor)

Name Type Required Description
provider str No Embedding provider name (default: "openai")
model str No Model name (default: "text-embedding-3-small")
api_key str or None No API key; if not provided, resolves from environment variables
**kwargs Any No Additional config: timeout, max_retries, batch_size, extra_config

embed_text()

Name Type Required Description
text str Yes Single text string to embed

embed_batch()

Name Type Required Description
texts list[str] Yes List of texts to embed

Outputs

Name Type Description
embed_text() list[float] Single embedding vector, or empty list for empty input
embed_batch() list[list[float]] List of embedding vectors, processed in batches
get_embedding_dimension() int or None Dimension of embedding vectors, or None if unknown
validate_connection() bool True if embedding service is working correctly
get_service_info() dict Dictionary with provider, model, dimension, batch_size, is_connected
list_supported_providers() list[str] List of all 18 supported provider names

Environment Variable Mapping

Provider Environment Variable
openai OPENAI_API_KEY
azure AZURE_OPENAI_API_KEY
amazon-bedrock AWS_ACCESS_KEY_ID
cohere COHERE_API_KEY
google-generativeai GOOGLE_API_KEY
google-vertex GOOGLE_APPLICATION_CREDENTIALS
huggingface HUGGINGFACE_API_KEY
jina JINA_API_KEY
ollama (none - runs locally)
voyageai VOYAGE_API_KEY
watsonx WATSONX_API_KEY
roboflow ROBOFLOW_API_KEY

Usage Examples

Basic Usage

from crewai_tools.rag.embedding_service import EmbeddingService

# Create with default OpenAI provider
service = EmbeddingService()

# Embed a single text
embedding = service.embed_text("Hello, world!")

# Embed multiple texts in batches
embeddings = service.embed_batch(["text1", "text2", "text3"])

# Check service info
info = service.get_service_info()
# {'provider': 'openai', 'model': 'text-embedding-3-small', 'embedding_dimension': 1536, ...}

Using Factory Methods

# OpenAI
service = EmbeddingService.create_openai_service(model="text-embedding-3-large")

# Voyage AI
service = EmbeddingService.create_voyage_service(model="voyage-2")

# Local Ollama
service = EmbeddingService.create_ollama_service(model="nomic-embed-text")

# Custom embedding function
service = EmbeddingService.create_custom_service(
    embedding_callable=my_custom_embedder
)

With Cohere

service = EmbeddingService.create_cohere_service(
    model="embed-english-v3.0",
    api_key="your-cohere-key"
)

# Validate the connection works
if service.validate_connection():
    dim = service.get_embedding_dimension()
    print(f"Connected! Embedding dimension: {dim}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment