Implementation:Ucbepic Docetl Directive DeterministicDocCompression
| Knowledge Sources | |
|---|---|
| Domains | Pipeline_Optimization, LLM_Operations |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for using deterministic logic (regex, patterns) to compress documents before expensive LLM operations provided by the DocETL reasoning optimizer.
Description
The DeterministicDocCompressionDirective class reduces LLM processing costs by using deterministic logic (regex, patterns) to compress documents before expensive downstream operations. It inserts a Code Map operation that removes irrelevant content using pattern matching and keyword extraction, keeping only the spans of text that match predefined relevance patterns with surrounding context.
Usage
The MOAR agent applies this directive when documents contain identifiable patterns or keywords and the goal is to reduce token costs for downstream LLM operations while improving accuracy by eliminating distracting irrelevant content.
Code Reference
Source Location
- Repository: Ucbepic_Docetl
- File: docetl/reasoning_optimizer/directives/deterministic_doc_compression.py
- Lines: 1-350
Signature
class DeterministicDocCompressionDirective(Directive):
name = "deterministic_doc_compression"
description = "Reduces LLM processing costs by using deterministic logic to compress documents before expensive downstream operations."
def check_applicability(self, ...) -> Tuple[bool, str]: ...
def apply(self, ...) -> Tuple[List[Dict], List[Dict], str, dict]: ...
Import
from docetl.reasoning_optimizer.directives.deterministic_doc_compression import DeterministicDocCompressionDirective
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| op_config | Dict | Yes | Operation configuration to transform |
| pipeline_ops | List[Dict] | Yes | Full pipeline operations list |
| op_idx | int | Yes | Index of target operation |
| dataset_descriptions | Dict | Yes | Dataset schema descriptions |
Outputs
| Name | Type | Description |
|---|---|---|
| new_ops | List[Dict] | Transformed operation configs |
| new_steps | List[Dict] | Updated pipeline steps |
| explanation | str | Human-readable description of changes |
| metadata | dict | Additional metadata about the transformation |
Usage Examples
# Directives are typically invoked by the MOAR agent automatically
# Example of manual invocation:
from docetl.reasoning_optimizer.directives.deterministic_doc_compression import DeterministicDocCompressionDirective
directive = DeterministicDocCompressionDirective()
applicable, reason = directive.check_applicability(op_config, pipeline_ops, op_idx, dataset_descriptions)
if applicable:
new_ops, new_steps, explanation, metadata = directive.apply(op_config, pipeline_ops, op_idx, dataset_descriptions)