Principle:Ucbepic Docetl Pandas Semantic Operations
| Knowledge Sources | |
|---|---|
| Domains | Data_Science, LLM_Operations |
| Last Updated | 2026-02-08 01:40 GMT |
Overview
A DataFrame integration principle that enables LLM-powered operations directly on Pandas DataFrames through a semantic accessor API.
Description
Pandas Semantic Operations extends Pandas DataFrames with a .semantic accessor that provides LLM-powered operations (map, filter, reduce, agg, merge, split, gather, unnest) as DataFrame methods. This enables data scientists to use DocETL operations in familiar Pandas workflows without constructing explicit Pipeline objects.
Operations track their history and cumulative costs, enabling reproducible LLM-powered data analysis within Jupyter notebooks and data science scripts.
Usage
Use the .semantic accessor when working with Pandas DataFrames and wanting to add LLM-powered transformations inline. Configure the model with df.semantic.set_config(default_model="gpt-4o").
Theoretical Basis
Accessor-based API extension:
- Registration: Pandas accessor registered via @pd.api.extensions.register_dataframe_accessor
- Delegation: Each accessor method wraps a DocETL operation and runs it via DSLRunner
- History Tracking: Operation history stored in DataFrame.attrs for reproducibility
- Cost Accumulation: Cumulative LLM costs tracked across chained operations