Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ucbepic Docetl Dataset Debate Gleaning

From Leeroopedia


Knowledge Sources
Domains Sample_Data, Data_Processing
Last Updated 2026-02-08 00:00 GMT

Overview

JSON dataset providing reduce-gleaning-optimized analysis output of theme evolution across U.S. presidential debate transcripts, generated using DocETL's gleaning feature.

Description

This file contains the results of a theme evolution analysis pipeline run on presidential debate transcripts using DocETL's reduce gleaning optimization. Like the baseline variant, each record contains a long-form analytical report examining the evolution of Democratic and Republican viewpoints on a specific political theme over multiple decades, along with the theme label. The key difference is that these reports were generated with the reduce gleaning feature enabled, which iteratively refines the reduce output to improve completeness and accuracy. Comparing this dataset against the baseline variant demonstrates the quality improvements achievable through DocETL's gleaning optimization.

Usage

This dataset is stored in the example_data/debates directory and is used to demonstrate the effectiveness of DocETL's reduce gleaning feature. By comparing reports in this file against the corresponding baseline reports, users can see how gleaning produces more thorough and detailed analyses.

Code Reference

Source Location

Data Structure

[
  {
    "report": "# Evolution of Democratic and Republican Viewpoints on Panama Canal Control (1976 - 2023)\n\n## Introduction\nThe Panama Canal has played a pivotal role in U.S. foreign policy...",
    "theme": "Panama Canal Control"
  },
  {
    "report": "# Analysis of Leadership and Guiding Principles from 2000 to 2023\n\n## Introduction\n...",
    "theme": "Leadership and Guiding Principles"
  }
]

I/O Contract

Schema

Field Type Description
report string Long-form analytical report (Markdown formatted) examining the evolution of Democratic and Republican viewpoints on the theme, generated with reduce gleaning optimization for improved completeness
theme string The political theme being analyzed (e.g., "Panama Canal Control", "Nuclear Proliferation", "Trust in Government")

Themes Covered

The dataset contains reports on the following political themes:

  • Panama Canal Control
  • Leadership and Guiding Principles
  • The Middle East and Relations with Israel
  • Trust in Government
  • Nuclear Proliferation
  • Economic Aid, Childcare, and Healthcare
  • Achieving Prosperity
  • Accepting the Election Outcome
  • Vice Presidential Selection
  • Education and Youth Opportunities
  • Education Reform
  • American Prestige and Global Influence
  • Campaign Character and Tonality
  • Arms Control
  • Pardon and Amnesty for Draft Evaders
  • And additional themes

Usage Examples

import json

# Compare baseline vs gleaning outputs
with open("example_data/debates/theme_evolution_analysis_baseline.json") as f:
    baseline = json.load(f)
with open("example_data/debates/theme_evolution_analysis_reduce_gleaning.json") as f:
    gleaning = json.load(f)

print(f"Baseline reports: {len(baseline)}")
print(f"Gleaning reports: {len(gleaning)}")

# Compare report lengths for the same theme
for b in baseline:
    for g in gleaning:
        if b["theme"] == g["theme"]:
            print(f"Theme: {b['theme']}")
            print(f"  Baseline length: {len(b['report'])} chars")
            print(f"  Gleaning length: {len(g['report'])} chars")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment