Implementation:Ucbepic Docetl Dataset Fatal Accidents
| Knowledge Sources | |
|---|---|
| Domains | Sample_Data, Data_Processing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
JSON dataset providing NTSB (National Transportation Safety Board) fatal aviation accident records for use as sample data in DocETL tutorials and documentation.
Description
This file contains a large collection of fatal aviation accident investigation records sourced from the NTSB database. Each record includes detailed metadata about the accident event such as location, date, aircraft information, injury counts, weather conditions, probable cause findings, and links to official PDF reports. The dataset serves as a real-world example for demonstrating DocETL's document processing and analysis capabilities in the project's documentation site.
Usage
This dataset is stored in the docs/assets directory and is used as sample input data for the DocETL documentation tutorials. It demonstrates how DocETL can process and analyze structured government safety data at scale.
Code Reference
Source Location
- Repository: Ucbepic_Docetl
- File: docs/assets/fatal.json
- Lines: 33763
Data Structure
[
{
"NtsbNo": "ERA25FA103",
"EventType": "ACC",
"Mkey": 199596,
"EventDate": "2025-01-25T13:35:00Z",
"City": "Charlottesville",
"State": "Virginia",
"Country": "United States",
"ReportNo": null,
"N#": "N2UZ",
"SerialNumber": "D-9980",
"HasSafetyRec": false,
"Mode": "Aviation",
"HighestInjuryLevel": "Fatal",
"FatalInjuryCount": 1,
"SeriousInjuryCount": 0,
"MinorInjuryCount": 0,
"OnboardInjuryCount": 1.0,
"OnGroundInjuryCount": 0.0,
"Latitude": 38.096676,
"Longitude ": -78.454202,
"Make": "BEECH",
"Model": "V35B",
"AirCraftCategory": "AIR",
"AirportID": "CHO",
"AirportName": "Charlottesville-Albemarle Airport",
"AmateurBuilt": "false",
"NumberOfEngines": "1",
"PurposeOfFlight": "PERS",
"AirCraftDamage": "Destroyed",
"WeatherCondition": "VMC",
"BroadPhaseofFlight": "Enroute",
"ReportStatus": "In work",
"PdfPath": "https://..."
}
]
I/O Contract
Schema
| Field | Type | Description |
|---|---|---|
| NtsbNo | string | NTSB investigation number identifier |
| EventType | string | Type of event (e.g., "ACC" for accident) |
| Mkey | integer | Internal NTSB record key |
| EventDate | string (ISO 8601) | Date and time of the accident event |
| City | string | City where the accident occurred |
| State | string | State where the accident occurred |
| Country | string | Country where the accident occurred |
| HighestInjuryLevel | string | Severity classification (e.g., "Fatal") |
| FatalInjuryCount | integer | Number of fatal injuries |
| SeriousInjuryCount | integer | Number of serious injuries |
| Latitude | float | Geographic latitude of accident location |
| Longitude | float | Geographic longitude of accident location |
| Make | string | Aircraft manufacturer |
| Model | string | Aircraft model designation |
| AirCraftCategory | string | Category of aircraft (e.g., "AIR") |
| AirportID | string | Nearest airport identifier code |
| AirportName | string | Nearest airport name |
| WeatherCondition | string | Weather conditions (e.g., "VMC", "IMC") |
| BroadPhaseofFlight | string | Phase of flight during accident |
| PdfPath | string | URL to the official NTSB report PDF |
Usage Examples
import json
with open("docs/assets/fatal.json") as f:
data = json.load(f)
# data is a list of NTSB fatal accident records with fields:
# NtsbNo, EventDate, City, State, Make, Model, FatalInjuryCount, etc.
print(f"Total records: {len(data)}")
print(f"First accident: {data[0]['NtsbNo']} in {data[0]['City']}, {data[0]['State']}")