Implementation:Ucbepic Docetl Dataset Debate Gleaning
| Knowledge Sources | |
|---|---|
| Domains | Sample_Data, Data_Processing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
JSON dataset providing reduce-gleaning-optimized analysis output of theme evolution across U.S. presidential debate transcripts, generated using DocETL's gleaning feature.
Description
This file contains the results of a theme evolution analysis pipeline run on presidential debate transcripts using DocETL's reduce gleaning optimization. Like the baseline variant, each record contains a long-form analytical report examining the evolution of Democratic and Republican viewpoints on a specific political theme over multiple decades, along with the theme label. The key difference is that these reports were generated with the reduce gleaning feature enabled, which iteratively refines the reduce output to improve completeness and accuracy. Comparing this dataset against the baseline variant demonstrates the quality improvements achievable through DocETL's gleaning optimization.
Usage
This dataset is stored in the example_data/debates directory and is used to demonstrate the effectiveness of DocETL's reduce gleaning feature. By comparing reports in this file against the corresponding baseline reports, users can see how gleaning produces more thorough and detailed analyses.
Code Reference
Source Location
- Repository: Ucbepic_Docetl
- File: example_data/debates/theme_evolution_analysis_reduce_gleaning.json
- Lines: 614
Data Structure
[
{
"report": "# Evolution of Democratic and Republican Viewpoints on Panama Canal Control (1976 - 2023)\n\n## Introduction\nThe Panama Canal has played a pivotal role in U.S. foreign policy...",
"theme": "Panama Canal Control"
},
{
"report": "# Analysis of Leadership and Guiding Principles from 2000 to 2023\n\n## Introduction\n...",
"theme": "Leadership and Guiding Principles"
}
]
I/O Contract
Schema
| Field | Type | Description |
|---|---|---|
| report | string | Long-form analytical report (Markdown formatted) examining the evolution of Democratic and Republican viewpoints on the theme, generated with reduce gleaning optimization for improved completeness |
| theme | string | The political theme being analyzed (e.g., "Panama Canal Control", "Nuclear Proliferation", "Trust in Government") |
Themes Covered
The dataset contains reports on the following political themes:
- Panama Canal Control
- Leadership and Guiding Principles
- The Middle East and Relations with Israel
- Trust in Government
- Nuclear Proliferation
- Economic Aid, Childcare, and Healthcare
- Achieving Prosperity
- Accepting the Election Outcome
- Vice Presidential Selection
- Education and Youth Opportunities
- Education Reform
- American Prestige and Global Influence
- Campaign Character and Tonality
- Arms Control
- Pardon and Amnesty for Draft Evaders
- And additional themes
Usage Examples
import json
# Compare baseline vs gleaning outputs
with open("example_data/debates/theme_evolution_analysis_baseline.json") as f:
baseline = json.load(f)
with open("example_data/debates/theme_evolution_analysis_reduce_gleaning.json") as f:
gleaning = json.load(f)
print(f"Baseline reports: {len(baseline)}")
print(f"Gleaning reports: {len(gleaning)}")
# Compare report lengths for the same theme
for b in baseline:
for g in gleaning:
if b["theme"] == g["theme"]:
print(f"Theme: {b['theme']}")
print(f" Baseline length: {len(b['report'])} chars")
print(f" Gleaning length: {len(g['report'])} chars")