Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ucbepic Docetl Directive MapToMapResolveReduce

From Leeroopedia


Knowledge Sources
Domains Pipeline_Optimization, LLM_Operations
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for inserting a Resolve operation between Map and Reduce to deduplicate entities before aggregation provided by the DocETL reasoning optimizer.

Description

The MapToMapResolveReduceDirective class inserts a Resolve operation between Map and Reduce to deduplicate or normalize entities before aggregation. The Resolve operation uses code-powered blocking conditions to efficiently identify which pairs to compare, avoiding O(n^2) comparisons. This is useful when the Map output contains duplicate or near-duplicate entities that should be merged before the Reduce step.

Usage

The MOAR agent applies this directive when a Map operation produces outputs that may contain duplicates, variations, or near-duplicates (e.g., different spellings of names, similar categories), and these should be normalized before the Reduce aggregation step. The target must be a Map operation followed by a Reduce operation.

Code Reference

Source Location

Signature

class MapToMapResolveReduceDirective(Directive):
    name = "map_to_map_resolve_reduce"
    description = "Insert a Resolve operation between Map and Reduce for entity deduplication before aggregation."

    def check_applicability(self, ...) -> Tuple[bool, str]: ...
    def apply(self, ...) -> Tuple[List[Dict], List[Dict], str, dict]: ...

Import

from docetl.reasoning_optimizer.directives.map_to_map_resolve_reduce import MapToMapResolveReduceDirective

I/O Contract

Inputs

Name Type Required Description
op_config Dict Yes Operation configuration to transform
pipeline_ops List[Dict] Yes Full pipeline operations list
op_idx int Yes Index of target operation
dataset_descriptions Dict Yes Dataset schema descriptions

Outputs

Name Type Description
new_ops List[Dict] Transformed operation configs
new_steps List[Dict] Updated pipeline steps
explanation str Human-readable description of changes
metadata dict Additional metadata about the transformation

Usage Examples

# Directives are typically invoked by the MOAR agent automatically
# Example of manual invocation:
from docetl.reasoning_optimizer.directives.map_to_map_resolve_reduce import MapToMapResolveReduceDirective

directive = MapToMapResolveReduceDirective()
applicable, reason = directive.check_applicability(op_config, pipeline_ops, op_idx, dataset_descriptions)
if applicable:
    new_ops, new_steps, explanation, metadata = directive.apply(op_config, pipeline_ops, op_idx, dataset_descriptions)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment