Implementation:Eventual Inc Daft AI Prompt
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Artificial_Intelligence |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for prompting LLMs on DataFrame columns provided by the Daft library.
Description
The prompt function returns an expression that sends text (and optionally images or files) to a large language model and returns the generated response. It supports multiple providers (OpenAI, Anthropic, vLLM, and any OpenAI-compatible API), structured outputs via Pydantic models, system messages, and multimodal inputs. For vLLM providers, a specialized native Rust execution path with prefix caching is used for maximum GPU efficiency.
Usage
Import and use this function when you need to process DataFrame rows through an LLM for classification, summarization, extraction, or generation tasks.
Code Reference
Source Location
- Repository: Daft
- File:
daft/functions/ai/__init__.py - Lines: L453-652
Signature
def prompt(
messages: list[Expression] | Expression,
return_format: BaseModel | None = None,
*,
system_message: str | None = None,
provider: str | Provider | None = None,
model: str | None = None,
**options: Any,
) -> Expression
Import
from daft.functions.ai import prompt
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| messages | Expression | Yes | The prompt text expression(s). Each expression can be plain text, image data, or file data (PDF, audio, video). |
| return_format | None | No | A Pydantic model for structured output. When provided, the LLM response is parsed into the model's schema. Defaults to None (plain text response).
|
| system_message | None | No | A system message providing instructions to the LLM. Applied to all rows. |
| provider | Provider | None | No | The LLM provider to use (e.g., "openai", "anthropic", "vllm"). Defaults to "openai".
|
| model | None | No | The specific model to use (e.g., "gpt-5-nano"). If None, the provider's default model is used.
|
| **options | Any | No | Additional provider-specific options (e.g., temperature, max_tokens). |
Outputs
| Name | Type | Description |
|---|---|---|
| return | Expression (String or Struct) | A String expression with the LLM response text, or a Struct expression matching the Pydantic model schema when return_format is provided.
|
Usage Examples
Basic Usage
import daft
from daft.functions.ai import prompt
df = daft.from_pydict({"text": ["What is the capital of France?", "What is 2 + 2?"]})
df = df.with_column(
"response",
prompt(
daft.col("text"),
provider="openai",
model="gpt-5-nano",
),
)
df.show()
Structured Output with Pydantic
import daft
from daft.functions.ai import prompt
from pydantic import BaseModel, Field
class Sentiment(BaseModel):
label: str = Field(description="Sentiment label: positive, negative, or neutral")
confidence: float = Field(description="Confidence score between 0 and 1")
df = daft.from_pydict({"review": ["Great product!", "Terrible experience."]})
df = df.with_column(
"sentiment",
prompt(
daft.col("review"),
return_format=Sentiment,
system_message="Classify the sentiment of the review.",
provider="openai",
model="gpt-5-nano",
),
)
df.show()