Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ucbepic Docetl Directive HierarchicalReduce

From Leeroopedia


Knowledge Sources
Domains Pipeline_Optimization, LLM_Operations
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for transforming a reduce operation into a two-level hierarchical aggregation provided by the DocETL reasoning optimizer.

Description

The HierarchicalReduceDirective class transforms a reduce operation that aggregates large groups of documents by first aggregating at a finer granularity (reduce_key + additional_key), then rolling up to the desired level (reduce_key only). Optionally, a Map operation can be inserted before the first Reduce to extract a synthetic sub-key (e.g., extracting city from post content). This hierarchical approach captures nuances that might be lost in a single large-scale aggregation.

Usage

The MOAR agent applies this directive when a reduce operation processes many documents per group and it would be beneficial to first aggregate at a finer granularity before rolling up. Useful when there is a semantic hierarchy in the data (e.g., aggregate by state+city first, then by state only) or when preventing information loss in large-scale aggregations.

Code Reference

Source Location

Signature

class HierarchicalReduceDirective(Directive):
    name = "hierarchical_reduce"
    description = "Transform a reduce into (Map* ->) Reduce -> Reduce for hierarchical two-level aggregation."

    def check_applicability(self, ...) -> Tuple[bool, str]: ...
    def apply(self, ...) -> Tuple[List[Dict], List[Dict], str, dict]: ...

Import

from docetl.reasoning_optimizer.directives.hierarchical_reduce import HierarchicalReduceDirective

I/O Contract

Inputs

Name Type Required Description
op_config Dict Yes Operation configuration to transform
pipeline_ops List[Dict] Yes Full pipeline operations list
op_idx int Yes Index of target operation
dataset_descriptions Dict Yes Dataset schema descriptions

Outputs

Name Type Description
new_ops List[Dict] Transformed operation configs
new_steps List[Dict] Updated pipeline steps
explanation str Human-readable description of changes
metadata dict Additional metadata about the transformation

Usage Examples

# Directives are typically invoked by the MOAR agent automatically
# Example of manual invocation:
from docetl.reasoning_optimizer.directives.hierarchical_reduce import HierarchicalReduceDirective

directive = HierarchicalReduceDirective()
applicable, reason = directive.check_applicability(op_config, pipeline_ops, op_idx, dataset_descriptions)
if applicable:
    new_ops, new_steps, explanation, metadata = directive.apply(op_config, pipeline_ops, op_idx, dataset_descriptions)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment