Implementation:Unstructured IO Unstructured Example Docs Fixture
| Knowledge Sources | |
|---|---|
| Domains | Testing, Document_Processing, Examples |
| Last Updated | 2026-02-12 09:30 GMT |
Overview
Example document fixture providing pre-parsed JSON element output for an HTML weather document used in integration tests and examples.
Description
The spring-weather.html.json file is a JSON representation of the parsed HTML content from the Spring Weather NOAA page. It contains 1,178 lines of structured element objects including titles, narrative text, and list items. Unlike the test fixture files under `test_unstructured_ingest/`, this file lives in the `example-docs/` directory and serves as both a test fixture and a demonstration of the Unstructured element JSON output format.
Usage
Reference this file when demonstrating the JSON output format of the Unstructured partitioning pipeline, or as input to downstream processing that consumes element JSON.
Code Reference
Source Location
- Repository: Unstructured_IO_Unstructured
- File: example-docs/spring-weather.html.json
- Lines: 1-1178
Signature
[
{
"type": "Title",
"element_id": "fb902c5b26b38e2d35a70a55d43a5de6",
"text": "News Around NOAA",
"metadata": {
"languages": ["eng"],
"filetype": "text/html",
"data_source": {
"url": "abfs://container1/spring-weather.html",
"version": "162215905222974206637545574128436022861",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "abfs://container1/"
},
"date_created": "1678441216.0",
"date_modified": "1678441216.0"
}
}
}
]
Import
# Load example fixture
import json
with open("example-docs/spring-weather.html.json") as f:
elements = json.load(f)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| JSON file path | str | Yes | Path to example-docs/spring-weather.html.json |
Outputs
| Name | Type | Description |
|---|---|---|
| elements | List[Dict] | JSON array of element dicts with type, element_id, text, metadata |
| metadata.data_source | Dict | Azure Blob Storage provenance (abfs protocol) |
| metadata.filetype | str | "text/html" |
| metadata.languages | List[str] | ["eng"] |
Usage Examples
Exploring Element Types
import json
from collections import Counter
with open("example-docs/spring-weather.html.json") as f:
elements = json.load(f)
# Count element types
counts = Counter(e["type"] for e in elements)
for element_type, count in counts.most_common():
print(f"{element_type}: {count}")