Implementation:Ucbepic Docetl Dataset Debate Gleaning

Knowledge Sources	Ucbepic_Docetl
Domains	Sample_Data, Data_Processing
Last Updated	2026-02-08 00:00 GMT

Overview

JSON dataset providing reduce-gleaning-optimized analysis output of theme evolution across U.S. presidential debate transcripts, generated using DocETL's gleaning feature.

Description

This file contains the results of a theme evolution analysis pipeline run on presidential debate transcripts using DocETL's reduce gleaning optimization. Like the baseline variant, each record contains a long-form analytical report examining the evolution of Democratic and Republican viewpoints on a specific political theme over multiple decades, along with the theme label. The key difference is that these reports were generated with the reduce gleaning feature enabled, which iteratively refines the reduce output to improve completeness and accuracy. Comparing this dataset against the baseline variant demonstrates the quality improvements achievable through DocETL's gleaning optimization.

Usage

This dataset is stored in the example_data/debates directory and is used to demonstrate the effectiveness of DocETL's reduce gleaning feature. By comparing reports in this file against the corresponding baseline reports, users can see how gleaning produces more thorough and detailed analyses.

Code Reference

Source Location

Repository: Ucbepic_Docetl
File: example_data/debates/theme_evolution_analysis_reduce_gleaning.json
Lines: 614

Data Structure

[
  {
    "report": "# Evolution of Democratic and Republican Viewpoints on Panama Canal Control (1976 - 2023)\n\n## Introduction\nThe Panama Canal has played a pivotal role in U.S. foreign policy...",
    "theme": "Panama Canal Control"
  },
  {
    "report": "# Analysis of Leadership and Guiding Principles from 2000 to 2023\n\n## Introduction\n...",
    "theme": "Leadership and Guiding Principles"
  }
]

I/O Contract

Schema

Field	Type	Description
report	string	Long-form analytical report (Markdown formatted) examining the evolution of Democratic and Republican viewpoints on the theme, generated with reduce gleaning optimization for improved completeness
theme	string	The political theme being analyzed (e.g., "Panama Canal Control", "Nuclear Proliferation", "Trust in Government")

Themes Covered

The dataset contains reports on the following political themes:

Panama Canal Control
Leadership and Guiding Principles
The Middle East and Relations with Israel
Trust in Government
Nuclear Proliferation
Economic Aid, Childcare, and Healthcare
Achieving Prosperity
Accepting the Election Outcome
Vice Presidential Selection
Education and Youth Opportunities
Education Reform
American Prestige and Global Influence
Campaign Character and Tonality
Arms Control
Pardon and Amnesty for Draft Evaders
And additional themes

Usage Examples

import json

# Compare baseline vs gleaning outputs
with open("example_data/debates/theme_evolution_analysis_baseline.json") as f:
    baseline = json.load(f)
with open("example_data/debates/theme_evolution_analysis_reduce_gleaning.json") as f:
    gleaning = json.load(f)

print(f"Baseline reports: {len(baseline)}")
print(f"Gleaning reports: {len(gleaning)}")

# Compare report lengths for the same theme
for b in baseline:
    for g in gleaning:
        if b["theme"] == g["theme"]:
            print(f"Theme: {b['theme']}")
            print(f"  Baseline length: {len(b['report'])} chars")
            print(f"  Gleaning length: {len(g['report'])} chars")

Related Pages

Environment:Ucbepic_Docetl_Python_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment