Implementation:Ucbepic Docetl Directive Gleaning
| Knowledge Sources | |
|---|---|
| Domains | Pipeline_Optimization, LLM_Operations |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for adding a validation loop to Map operations with a judge LLM provided by the DocETL reasoning optimizer.
Description
The GleaningDirective class adds a validation loop to Map operations: after each LLM generation, a separate "judge" LLM evaluates the output using a yes/no validation prompt. If the output fails, the original LLM refines its answer and repeats until the output passes or the max number of rounds is reached. This enables iterative quality improvement for extraction and analysis tasks without changing the pipeline structure.
Usage
The MOAR agent applies this directive when initial Map outputs may not meet quality criteria and must be checked or improved automatically (e.g., outputs are too short, missing required information, or not meeting formatting requirements).
Code Reference
Source Location
- Repository: Ucbepic_Docetl
- File: docetl/reasoning_optimizer/directives/gleaning.py
- Lines: 1-231
Signature
class GleaningDirective(Directive):
name = "gleaning"
description = "Adds a validation loop to Map: after each LLM generation, a judge LLM evaluates and refines the output."
def check_applicability(self, ...) -> Tuple[bool, str]: ...
def apply(self, ...) -> Tuple[List[Dict], List[Dict], str, dict]: ...
Import
from docetl.reasoning_optimizer.directives.gleaning import GleaningDirective
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| op_config | Dict | Yes | Operation configuration to transform |
| pipeline_ops | List[Dict] | Yes | Full pipeline operations list |
| op_idx | int | Yes | Index of target operation |
| dataset_descriptions | Dict | Yes | Dataset schema descriptions |
Outputs
| Name | Type | Description |
|---|---|---|
| new_ops | List[Dict] | Transformed operation configs |
| new_steps | List[Dict] | Updated pipeline steps |
| explanation | str | Human-readable description of changes |
| metadata | dict | Additional metadata about the transformation |
Usage Examples
# Directives are typically invoked by the MOAR agent automatically
# Example of manual invocation:
from docetl.reasoning_optimizer.directives.gleaning import GleaningDirective
directive = GleaningDirective()
applicable, reason = directive.check_applicability(op_config, pipeline_ops, op_idx, dataset_descriptions)
if applicable:
new_ops, new_steps, explanation, metadata = directive.apply(op_config, pipeline_ops, op_idx, dataset_descriptions)