Implementation:Ucbepic Docetl Dataset Debate Baseline
| Knowledge Sources | |
|---|---|
| Domains | Sample_Data, Data_Processing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
JSON dataset providing baseline analysis output of theme evolution across U.S. presidential debate transcripts, generated without the reduce gleaning optimization in DocETL.
Description
This file contains the results of a theme evolution analysis pipeline run on presidential debate transcripts using a standard (non-gleaning) reduce operation. Each record contains a long-form analytical report examining how Democratic and Republican viewpoints on a specific political theme have evolved over multiple decades, along with the theme label. The reports cover topics such as Experience and Leadership, Race Relations, Family Values, Arms Control, Immigration, and many others. This dataset serves as a baseline for comparison against the gleaning-optimized variant to demonstrate the quality improvements that DocETL's reduce gleaning feature provides.
Usage
This dataset is stored in the example_data/debates directory and is used to demonstrate and compare DocETL pipeline output quality. It is specifically intended as a comparison baseline against the reduce gleaning variant to show how different pipeline configurations affect output quality and completeness.
Code Reference
Source Location
- Repository: Ucbepic_Docetl
- File: example_data/debates/theme_evolution_analysis_baseline.json
- Lines: 608
Data Structure
[
{
"report": "# Analysis of Democratic and Republican Viewpoints on 'Experience and Leadership' (2000 - 2023)\n\n## Introduction\nThe theme of \"Experience and Leadership\" has been a focal point in American politics...",
"theme": "Experience and Leadership"
},
{
"report": "# Evolution of Democratic and Republican Viewpoints on \"Character and Experience\" (1992 - 2023)\n\n## Introduction\n...",
"theme": "Character and Experience"
}
]
I/O Contract
Schema
| Field | Type | Description |
|---|---|---|
| report | string | Long-form analytical report (Markdown formatted) examining the evolution of Democratic and Republican viewpoints on the theme across multiple election cycles |
| theme | string | The political theme being analyzed (e.g., "Experience and Leadership", "Race Relations", "Family Values") |
Themes Covered
The dataset contains reports on the following political themes:
- Experience and Leadership
- Character and Experience
- Illegal Immigration
- Race Relations
- Central America
- Family Values
- Veterans Affairs
- Infrastructure Development
- Education and Values
- Cuba and Foreign Policy
- Pardon and Amnesty for Draft Evaders
- Human Rights and Morality in Foreign Policy
- Campaign Character and Tonality
- Soviet Union
- Peaceful Transfer of Power and January 6
- And additional themes
Usage Examples
import json
with open("example_data/debates/theme_evolution_analysis_baseline.json") as f:
data = json.load(f)
# data is a list of theme analysis records with fields: report, theme
print(f"Total theme analyses: {len(data)}")
for record in data:
print(f"Theme: {record['theme']}")