Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ucbepic Docetl SemanticAccessor Usage

From Leeroopedia


Knowledge Sources
Domains Data_Science, LLM_Operations
Last Updated 2026-02-08 01:40 GMT

Overview

Concrete Pandas DataFrame accessor for LLM-powered semantic operations provided by DocETL.

Description

The SemanticAccessor class is registered as a Pandas DataFrame accessor at df.semantic. It provides methods for map, filter, reduce, agg, merge, split, gather, and unnest operations. Each method constructs a DocETL operation config, runs it through DSLRunner, records the operation in history, and returns a new DataFrame with results.

Usage

Import docetl.apis.pd_accessors to register the accessor. Then use df.semantic.map(), df.semantic.filter(), etc. on any Pandas DataFrame. Set the model with df.semantic.set_config(default_model="...").

Code Reference

Source Location

  • Repository: docetl
  • File: docetl/apis/pd_accessors.py
  • Lines: L61-1069

Signature

@pd.api.extensions.register_dataframe_accessor("semantic")
class SemanticAccessor:
    def __init__(self, df: pd.DataFrame): ...

    def set_config(self, **config): ...

    def map(self, prompt: str, output: dict | None = None, **kwargs) -> pd.DataFrame: ...
    def filter(self, prompt: str, **kwargs) -> pd.DataFrame: ...
    def reduce(self, prompt: str, output: dict | None = None,
               reduce_keys: str | list[str] = ["_all"], **kwargs) -> pd.DataFrame: ...
    def agg(self, reduce_prompt: str, ...) -> pd.DataFrame: ...
    def merge(self, right: pd.DataFrame, comparison_prompt: str, **kwargs) -> pd.DataFrame: ...
    def split(self, split_key: str, method: str, method_kwargs: dict, **kwargs) -> pd.DataFrame: ...
    def gather(self, ...) -> pd.DataFrame: ...
    def unnest(self, ...) -> pd.DataFrame: ...

    @property
    def total_cost(self) -> float: ...

    @property
    def history(self) -> list[OpHistory]: ...

Import

import docetl.apis.pd_accessors  # Registers the .semantic accessor
import pandas as pd

I/O Contract

Inputs

Name Type Required Description
prompt str Yes Jinja2 template for LLM operations
output dict No Output schema definition
model str No Set via set_config(default_model=...)

Outputs

Name Type Description
returns pd.DataFrame DataFrame with LLM-derived columns added
total_cost float Cumulative LLM cost across all operations
history list[OpHistory] Record of all applied operations

Usage Examples

import pandas as pd
import docetl.apis.pd_accessors

df = pd.DataFrame({"text": ["Document 1 content", "Document 2 content"]})
df.semantic.set_config(default_model="gpt-4o-mini")

# Map: extract entities
result = df.semantic.map(
    prompt="Extract entities from: {{ input.text }}",
    output={"schema": {"entities": "list[str]"}}
)

# Filter: keep relevant documents
filtered = result.semantic.filter(
    prompt="Is this document about technology? {{ input.text }}"
)

# Check costs
print(f"Total cost: ${filtered.semantic.total_cost:.2f}")

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment