Implementation:Ucbepic Docetl Dataset Debate Transcripts

Knowledge Sources	Ucbepic_Docetl
Domains	Sample_Data, Data_Processing
Last Updated	2026-02-08 00:00 GMT

Overview

JSON dataset providing a curated collection of U.S. presidential and vice-presidential debate transcripts spanning from 1960 to 2024, used as input data for the DocETL website demo.

Description

This file contains full-text transcripts of notable U.S. presidential and vice-presidential debates organized chronologically. Each record includes the debate year, date, title, and complete transcript text. The collection spans over six decades of American political discourse, from the Kennedy-Nixon debates of 1960 through the Biden-Trump and Harris-Trump debates of 2024. This dataset serves as the primary input for the debate theme evolution analysis pipeline demonstrated on the DocETL website.

Usage

This dataset is stored in the website/public directory and is served as a static asset on the DocETL website. It is used as the input dataset for the presidential debate theme evolution analysis demo, which processes these transcripts through DocETL map and reduce operations to generate thematic analyses of how political viewpoints have evolved over time.

Code Reference

Source Location

Repository: Ucbepic_Docetl
File: website/public/debate_transcripts.json
Lines: 394

Data Structure

[
  {
    "year": 2024,
    "date": "June 27, 2024",
    "title": "The Biden-Trump Presidential Debate"
  },
  {
    "year": 2024,
    "date": "September 10, 2024",
    "title": "The Harris-Trump Presidential Debate"
  },
  {
    "year": 2016,
    "date": "October 9, 2016",
    "title": "The Second Clinton-Trump Presidential Debate"
  }
]

I/O Contract

Schema

Field	Type	Description
year	integer	Year of the debate (ranges from 1960 to 2024)
date	string	Human-readable date of the debate (e.g., "June 27, 2024")
title	string	Full descriptive title of the debate (e.g., "The Biden-Trump Presidential Debate")

Note: Each record also contains the full transcript text, which is omitted from the structure sample above due to length.

Debates Included

The dataset includes transcripts from the following debates (partial list):

2024: Biden-Trump, Harris-Trump
2016: First, Second, and Third Clinton-Trump
2008: Second and Third McCain-Obama
2004: Second Bush-Kerry, Cheney-Edwards VP
2000: First and Third Gore-Bush, Lieberman-Cheney VP
1996: First Clinton-Dole, Gore-Kemp VP, Second Presidential
1992: First and Second Clinton-Bush-Perot (halves)
1988: First and Second Bush-Dukakis, Bentsen-Quayle VP
1984: Second Reagan-Mondale
1980: Anderson-Reagan
1976: First, Second, and Third Carter-Ford
1960: Second, Third, and Fourth Kennedy-Nixon

Usage Examples

import json

with open("website/public/debate_transcripts.json") as f:
    data = json.load(f)
# data is a list of debate transcript records with fields: year, date, title
print(f"Total debates: {len(data)}")
for debate in data:
    print(f"{debate['year']}: {debate['title']} ({debate['date']})")

Related Pages

Environment:Ucbepic_Docetl_Python_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment