Implementation:Ucbepic Docetl Directive DocCompression
| Knowledge Sources | |
|---|---|
| Domains | Pipeline_Optimization, LLM_Operations |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for using an Extract operator to intelligently compress documents before expensive downstream LLM operations provided by the DocETL reasoning optimizer.
Description
The DocCompressionDirective class reduces LLM processing costs by using an Extract operator to intelligently compress documents before expensive downstream operations, removing irrelevant content that could distract the LLM. Unlike the deterministic variant, this directive uses an LLM-powered Extract operation to identify and retain the most relevant content, enabling more nuanced document compression that understands semantic relevance.
Usage
The MOAR agent applies this directive when documents contain irrelevant content and the goal is to reduce token costs for downstream LLM operations while improving accuracy by having the LLM focus on only the essential content.
Code Reference
Source Location
- Repository: Ucbepic_Docetl
- File: docetl/reasoning_optimizer/directives/doc_compression.py
- Lines: 1-254
Signature
class DocCompressionDirective(Directive):
name = "doc_compression"
description = "Reduces LLM processing costs by using an Extract operator to compress documents before expensive downstream operations."
def check_applicability(self, ...) -> Tuple[bool, str]: ...
def apply(self, ...) -> Tuple[List[Dict], List[Dict], str, dict]: ...
Import
from docetl.reasoning_optimizer.directives.doc_compression import DocCompressionDirective
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| op_config | Dict | Yes | Operation configuration to transform |
| pipeline_ops | List[Dict] | Yes | Full pipeline operations list |
| op_idx | int | Yes | Index of target operation |
| dataset_descriptions | Dict | Yes | Dataset schema descriptions |
Outputs
| Name | Type | Description |
|---|---|---|
| new_ops | List[Dict] | Transformed operation configs |
| new_steps | List[Dict] | Updated pipeline steps |
| explanation | str | Human-readable description of changes |
| metadata | dict | Additional metadata about the transformation |
Usage Examples
# Directives are typically invoked by the MOAR agent automatically
# Example of manual invocation:
from docetl.reasoning_optimizer.directives.doc_compression import DocCompressionDirective
directive = DocCompressionDirective()
applicable, reason = directive.check_applicability(op_config, pipeline_ops, op_idx, dataset_descriptions)
if applicable:
new_ops, new_steps, explanation, metadata = directive.apply(op_config, pipeline_ops, op_idx, dataset_descriptions)