Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ucbepic Docetl Dataset Debate Transcripts

From Leeroopedia


Knowledge Sources
Domains Sample_Data, Data_Processing
Last Updated 2026-02-08 00:00 GMT

Overview

JSON dataset providing a curated collection of U.S. presidential and vice-presidential debate transcripts spanning from 1960 to 2024, used as input data for the DocETL website demo.

Description

This file contains full-text transcripts of notable U.S. presidential and vice-presidential debates organized chronologically. Each record includes the debate year, date, title, and complete transcript text. The collection spans over six decades of American political discourse, from the Kennedy-Nixon debates of 1960 through the Biden-Trump and Harris-Trump debates of 2024. This dataset serves as the primary input for the debate theme evolution analysis pipeline demonstrated on the DocETL website.

Usage

This dataset is stored in the website/public directory and is served as a static asset on the DocETL website. It is used as the input dataset for the presidential debate theme evolution analysis demo, which processes these transcripts through DocETL map and reduce operations to generate thematic analyses of how political viewpoints have evolved over time.

Code Reference

Source Location

Data Structure

[
  {
    "year": 2024,
    "date": "June 27, 2024",
    "title": "The Biden-Trump Presidential Debate"
  },
  {
    "year": 2024,
    "date": "September 10, 2024",
    "title": "The Harris-Trump Presidential Debate"
  },
  {
    "year": 2016,
    "date": "October 9, 2016",
    "title": "The Second Clinton-Trump Presidential Debate"
  }
]

I/O Contract

Schema

Field Type Description
year integer Year of the debate (ranges from 1960 to 2024)
date string Human-readable date of the debate (e.g., "June 27, 2024")
title string Full descriptive title of the debate (e.g., "The Biden-Trump Presidential Debate")

Note: Each record also contains the full transcript text, which is omitted from the structure sample above due to length.

Debates Included

The dataset includes transcripts from the following debates (partial list):

  • 2024: Biden-Trump, Harris-Trump
  • 2016: First, Second, and Third Clinton-Trump
  • 2008: Second and Third McCain-Obama
  • 2004: Second Bush-Kerry, Cheney-Edwards VP
  • 2000: First and Third Gore-Bush, Lieberman-Cheney VP
  • 1996: First Clinton-Dole, Gore-Kemp VP, Second Presidential
  • 1992: First and Second Clinton-Bush-Perot (halves)
  • 1988: First and Second Bush-Dukakis, Bentsen-Quayle VP
  • 1984: Second Reagan-Mondale
  • 1980: Anderson-Reagan
  • 1976: First, Second, and Third Carter-Ford
  • 1960: Second, Third, and Fourth Kennedy-Nixon

Usage Examples

import json

with open("website/public/debate_transcripts.json") as f:
    data = json.load(f)
# data is a list of debate transcript records with fields: year, date, title
print(f"Total debates: {len(data)}")
for debate in data:
    print(f"{debate['year']}: {debate['title']} ({debate['date']})")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment