Implementation:Ucbepic Docetl Directive HierarchicalReduce

Knowledge Sources	Ucbepic_Docetl
Domains	Pipeline_Optimization, LLM_Operations
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for transforming a reduce operation into a two-level hierarchical aggregation provided by the DocETL reasoning optimizer.

Description

The HierarchicalReduceDirective class transforms a reduce operation that aggregates large groups of documents by first aggregating at a finer granularity (reduce_key + additional_key), then rolling up to the desired level (reduce_key only). Optionally, a Map operation can be inserted before the first Reduce to extract a synthetic sub-key (e.g., extracting city from post content). This hierarchical approach captures nuances that might be lost in a single large-scale aggregation.

Usage

The MOAR agent applies this directive when a reduce operation processes many documents per group and it would be beneficial to first aggregate at a finer granularity before rolling up. Useful when there is a semantic hierarchy in the data (e.g., aggregate by state+city first, then by state only) or when preventing information loss in large-scale aggregations.

Code Reference

Source Location

Repository: Ucbepic_Docetl
File: docetl/reasoning_optimizer/directives/hierarchical_reduce.py
Lines: 1-322

Signature

class HierarchicalReduceDirective(Directive):
    name = "hierarchical_reduce"
    description = "Transform a reduce into (Map* ->) Reduce -> Reduce for hierarchical two-level aggregation."

    def check_applicability(self, ...) -> Tuple[bool, str]: ...
    def apply(self, ...) -> Tuple[List[Dict], List[Dict], str, dict]: ...

Import

from docetl.reasoning_optimizer.directives.hierarchical_reduce import HierarchicalReduceDirective

I/O Contract

Inputs

Name	Type	Required	Description
op_config	Dict	Yes	Operation configuration to transform
pipeline_ops	List[Dict]	Yes	Full pipeline operations list
op_idx	int	Yes	Index of target operation
dataset_descriptions	Dict	Yes	Dataset schema descriptions

Outputs

Name	Type	Description
new_ops	List[Dict]	Transformed operation configs
new_steps	List[Dict]	Updated pipeline steps
explanation	str	Human-readable description of changes
metadata	dict	Additional metadata about the transformation

Usage Examples

# Directives are typically invoked by the MOAR agent automatically
# Example of manual invocation:
from docetl.reasoning_optimizer.directives.hierarchical_reduce import HierarchicalReduceDirective

directive = HierarchicalReduceDirective()
applicable, reason = directive.check_applicability(op_config, pipeline_ops, op_idx, dataset_descriptions)
if applicable:
    new_ops, new_steps, explanation, metadata = directive.apply(op_config, pipeline_ops, op_idx, dataset_descriptions)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment