Implementation:Ucbepic Docetl Directive CascadeFiltering
| Knowledge Sources | |
|---|---|
| Domains | Pipeline_Optimization, LLM_Operations |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for injecting a cascade of cheaper pre-filters before an expensive filter operation provided by the DocETL reasoning optimizer.
Description
The CascadeFilteringDirective class optimizes filtering costs by inserting a cascade of progressively cheaper filters before the main filter. The cascade starts with deterministic code filters (cheapest), then gpt-5-nano filters (ordered by prompt length), before the original expensive filter. Pre-filters prioritize high recall (rarely rejecting valid documents) and can have lower precision.
Usage
The MOAR agent applies this directive when there is an expensive Filter operation (using costly models or complex prompts) and the data contains patterns that allow for cheaper pre-filtering. The pre-filters must have high recall but can have lower precision, as the final filter provides the actual precision.
Code Reference
Source Location
- Repository: Ucbepic_Docetl
- File: docetl/reasoning_optimizer/directives/cascade_filtering.py
- Lines: 1-443
Signature
class CascadeFilteringDirective(Directive):
name = "cascade_filtering"
description = "Optimizes filtering costs by injecting a cascade of cheaper filters before the main filter."
def check_applicability(self, ...) -> Tuple[bool, str]: ...
def apply(self, ...) -> Tuple[List[Dict], List[Dict], str, dict]: ...
Import
from docetl.reasoning_optimizer.directives.cascade_filtering import CascadeFilteringDirective
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| op_config | Dict | Yes | Operation configuration to transform |
| pipeline_ops | List[Dict] | Yes | Full pipeline operations list |
| op_idx | int | Yes | Index of target operation |
| dataset_descriptions | Dict | Yes | Dataset schema descriptions |
Outputs
| Name | Type | Description |
|---|---|---|
| new_ops | List[Dict] | Transformed operation configs |
| new_steps | List[Dict] | Updated pipeline steps |
| explanation | str | Human-readable description of changes |
| metadata | dict | Additional metadata about the transformation |
Usage Examples
# Directives are typically invoked by the MOAR agent automatically
# Example of manual invocation:
from docetl.reasoning_optimizer.directives.cascade_filtering import CascadeFilteringDirective
directive = CascadeFilteringDirective()
applicable, reason = directive.check_applicability(op_config, pipeline_ops, op_idx, dataset_descriptions)
if applicable:
new_ops, new_steps, explanation, metadata = directive.apply(op_config, pipeline_ops, op_idx, dataset_descriptions)