Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ucbepic Docetl Dataset Fatal Accidents

From Leeroopedia


Knowledge Sources
Domains Sample_Data, Data_Processing
Last Updated 2026-02-08 00:00 GMT

Overview

JSON dataset providing NTSB (National Transportation Safety Board) fatal aviation accident records for use as sample data in DocETL tutorials and documentation.

Description

This file contains a large collection of fatal aviation accident investigation records sourced from the NTSB database. Each record includes detailed metadata about the accident event such as location, date, aircraft information, injury counts, weather conditions, probable cause findings, and links to official PDF reports. The dataset serves as a real-world example for demonstrating DocETL's document processing and analysis capabilities in the project's documentation site.

Usage

This dataset is stored in the docs/assets directory and is used as sample input data for the DocETL documentation tutorials. It demonstrates how DocETL can process and analyze structured government safety data at scale.

Code Reference

Source Location

Data Structure

[
  {
    "NtsbNo": "ERA25FA103",
    "EventType": "ACC",
    "Mkey": 199596,
    "EventDate": "2025-01-25T13:35:00Z",
    "City": "Charlottesville",
    "State": "Virginia",
    "Country": "United States",
    "ReportNo": null,
    "N#": "N2UZ",
    "SerialNumber": "D-9980",
    "HasSafetyRec": false,
    "Mode": "Aviation",
    "HighestInjuryLevel": "Fatal",
    "FatalInjuryCount": 1,
    "SeriousInjuryCount": 0,
    "MinorInjuryCount": 0,
    "OnboardInjuryCount": 1.0,
    "OnGroundInjuryCount": 0.0,
    "Latitude": 38.096676,
    "Longitude ": -78.454202,
    "Make": "BEECH",
    "Model": "V35B",
    "AirCraftCategory": "AIR",
    "AirportID": "CHO",
    "AirportName": "Charlottesville-Albemarle Airport",
    "AmateurBuilt": "false",
    "NumberOfEngines": "1",
    "PurposeOfFlight": "PERS",
    "AirCraftDamage": "Destroyed",
    "WeatherCondition": "VMC",
    "BroadPhaseofFlight": "Enroute",
    "ReportStatus": "In work",
    "PdfPath": "https://..."
  }
]

I/O Contract

Schema

Field Type Description
NtsbNo string NTSB investigation number identifier
EventType string Type of event (e.g., "ACC" for accident)
Mkey integer Internal NTSB record key
EventDate string (ISO 8601) Date and time of the accident event
City string City where the accident occurred
State string State where the accident occurred
Country string Country where the accident occurred
HighestInjuryLevel string Severity classification (e.g., "Fatal")
FatalInjuryCount integer Number of fatal injuries
SeriousInjuryCount integer Number of serious injuries
Latitude float Geographic latitude of accident location
Longitude float Geographic longitude of accident location
Make string Aircraft manufacturer
Model string Aircraft model designation
AirCraftCategory string Category of aircraft (e.g., "AIR")
AirportID string Nearest airport identifier code
AirportName string Nearest airport name
WeatherCondition string Weather conditions (e.g., "VMC", "IMC")
BroadPhaseofFlight string Phase of flight during accident
PdfPath string URL to the official NTSB report PDF

Usage Examples

import json

with open("docs/assets/fatal.json") as f:
    data = json.load(f)
# data is a list of NTSB fatal accident records with fields:
# NtsbNo, EventDate, City, State, Make, Model, FatalInjuryCount, etc.
print(f"Total records: {len(data)}")
print(f"First accident: {data[0]['NtsbNo']} in {data[0]['City']}, {data[0]['State']}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment