Implementation:Ucbepic Docetl Directive HierarchicalReduce
| Knowledge Sources | |
|---|---|
| Domains | Pipeline_Optimization, LLM_Operations |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for transforming a reduce operation into a two-level hierarchical aggregation provided by the DocETL reasoning optimizer.
Description
The HierarchicalReduceDirective class transforms a reduce operation that aggregates large groups of documents by first aggregating at a finer granularity (reduce_key + additional_key), then rolling up to the desired level (reduce_key only). Optionally, a Map operation can be inserted before the first Reduce to extract a synthetic sub-key (e.g., extracting city from post content). This hierarchical approach captures nuances that might be lost in a single large-scale aggregation.
Usage
The MOAR agent applies this directive when a reduce operation processes many documents per group and it would be beneficial to first aggregate at a finer granularity before rolling up. Useful when there is a semantic hierarchy in the data (e.g., aggregate by state+city first, then by state only) or when preventing information loss in large-scale aggregations.
Code Reference
Source Location
- Repository: Ucbepic_Docetl
- File: docetl/reasoning_optimizer/directives/hierarchical_reduce.py
- Lines: 1-322
Signature
class HierarchicalReduceDirective(Directive):
name = "hierarchical_reduce"
description = "Transform a reduce into (Map* ->) Reduce -> Reduce for hierarchical two-level aggregation."
def check_applicability(self, ...) -> Tuple[bool, str]: ...
def apply(self, ...) -> Tuple[List[Dict], List[Dict], str, dict]: ...
Import
from docetl.reasoning_optimizer.directives.hierarchical_reduce import HierarchicalReduceDirective
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| op_config | Dict | Yes | Operation configuration to transform |
| pipeline_ops | List[Dict] | Yes | Full pipeline operations list |
| op_idx | int | Yes | Index of target operation |
| dataset_descriptions | Dict | Yes | Dataset schema descriptions |
Outputs
| Name | Type | Description |
|---|---|---|
| new_ops | List[Dict] | Transformed operation configs |
| new_steps | List[Dict] | Updated pipeline steps |
| explanation | str | Human-readable description of changes |
| metadata | dict | Additional metadata about the transformation |
Usage Examples
# Directives are typically invoked by the MOAR agent automatically
# Example of manual invocation:
from docetl.reasoning_optimizer.directives.hierarchical_reduce import HierarchicalReduceDirective
directive = HierarchicalReduceDirective()
applicable, reason = directive.check_applicability(op_config, pipeline_ops, op_idx, dataset_descriptions)
if applicable:
new_ops, new_steps, explanation, metadata = directive.apply(op_config, pipeline_ops, op_idx, dataset_descriptions)