Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Eventual Inc Daft AI Embed Text

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Machine_Learning
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for computing text embeddings on DataFrame columns provided by the Daft library.

Description

The embed_text function returns an expression that embeds text using a specified embedding model and provider. It supports both local model inference (via the transformers provider) and remote API-based embedding (via the openai provider or other compatible APIs). The function automatically selects between synchronous and asynchronous execution based on the provider, and supports configurable output dimensions, batch sizes, GPU allocation, and concurrency.

Usage

Import and use this function when you need to compute dense vector embeddings of text data for semantic search, clustering, or downstream ML tasks.

Code Reference

Source Location

  • Repository: Daft
  • File: daft/functions/ai/__init__.py
  • Lines: L72-154

Signature

def embed_text(
    text: Expression,
    *,
    provider: str | Provider | None = None,
    model: str | None = None,
    dimensions: int | None = None,
    **options: Unpack[EmbedTextOptions],
) -> Expression

Import

from daft.functions.ai import embed_text

# or
from daft.functions import embed_text

I/O Contract

Inputs

Name Type Required Description
text Expression (String) Yes The input text column expression to embed.
provider Provider | None No The embedding provider (e.g., "transformers", "openai"). Defaults to "transformers" when not specified.
model None No The embedding model name (e.g., "sentence-transformers/all-MiniLM-L6-v2"). If None, the provider's default model is used.
dimensions None No Number of output embedding dimensions, if the provider and model support specifying. If None, uses the model's default.
**options EmbedTextOptions No Additional provider-specific options (e.g., batch_size, concurrency).

Outputs

Name Type Description
return Expression (FixedSizeList[Float32]) An Embedding expression containing fixed-size float vectors representing the text embeddings.

Usage Examples

Basic Usage

import daft
from daft.functions import embed_text

df = daft.from_pydict({"text": ["Hello world", "Daft is a distributed dataframe"]})
df = df.with_column(
    "embeddings",
    embed_text(
        daft.col("text"),
        provider="transformers",
        model="sentence-transformers/all-MiniLM-L6-v2",
    ),
)
df.show()

Using OpenAI Provider

import daft
from daft.functions import embed_text

df = daft.from_pydict({"text": ["semantic search query", "document to embed"]})
df = df.with_column(
    "embeddings",
    embed_text(
        daft.col("text"),
        provider="openai",
        model="text-embedding-3-small",
        dimensions=256,
    ),
)
df.show()

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment