Implementation:Ucbepic Docetl MapOperation Execute
| Knowledge Sources | |
|---|---|
| Domains | NLP, LLM_Operations |
| Last Updated | 2026-02-08 01:40 GMT |
Overview
Concrete operation for applying LLM transformations to individual documents or chunks provided by DocETL's operations module.
Description
MapOperation processes each input document independently through an LLM using a Jinja2 prompt template and structured output schema. It supports gleaning (iterative validation), batching (multiple documents per call), parallel execution, calibration, and tool use. The operation is the most widely used in DocETL pipelines.
Usage
Use MapOperation for per-document or per-chunk LLM processing. In a chunking pipeline, it processes enriched chunks after GatherOperation. It can also be used standalone for simple document transformations.
Code Reference
Source Location
- Repository: docetl
- File: docetl/operations/map.py
- Lines: L23-857
Signature
class MapOperation(BaseOperation):
class schema(BaseOperation.schema):
type: str = "map"
output: dict[str, Any] | None = None
prompt: str | None = None
model: str | None = None
optimize: bool | None = None
batch_size: int | None = None
gleaning: dict | None = None
timeout: int | None = None
litellm_completion_kwargs: dict[str, Any] = {}
def execute(self, input_data: list[dict]) -> tuple[list[dict], float]:
"""Process each document via LLM. Returns (results, total_cost)."""
Import
from docetl.operations.map import MapOperation
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| prompt | str | Yes | Jinja2 template with Template:Input.field variables |
| output.schema | dict | Yes | Expected output field names and types |
| model | str | No | LLM model name (defaults to pipeline default) |
| input_data | list[dict] | Yes | Documents or chunks to process |
Outputs
| Name | Type | Description |
|---|---|---|
| results | list[dict] | Documents with LLM-generated fields added |
| cost | float | Total LLM API cost |
Usage Examples
operations:
- name: extract_info
type: map
prompt: |
Extract key information from this text chunk:
{{ input.content_chunk_rendered }}
output:
schema:
key_findings: "list[str]"
entities: "list[str]"
model: gpt-4o