Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Unstructured IO Unstructured Example Docs Fixture

From Leeroopedia
Knowledge Sources
Domains Testing, Document_Processing, Examples
Last Updated 2026-02-12 09:30 GMT

Overview

Example document fixture providing pre-parsed JSON element output for an HTML weather document used in integration tests and examples.

Description

The spring-weather.html.json file is a JSON representation of the parsed HTML content from the Spring Weather NOAA page. It contains 1,178 lines of structured element objects including titles, narrative text, and list items. Unlike the test fixture files under `test_unstructured_ingest/`, this file lives in the `example-docs/` directory and serves as both a test fixture and a demonstration of the Unstructured element JSON output format.

Usage

Reference this file when demonstrating the JSON output format of the Unstructured partitioning pipeline, or as input to downstream processing that consumes element JSON.

Code Reference

Source Location

Signature

[
  {
    "type": "Title",
    "element_id": "fb902c5b26b38e2d35a70a55d43a5de6",
    "text": "News Around NOAA",
    "metadata": {
      "languages": ["eng"],
      "filetype": "text/html",
      "data_source": {
        "url": "abfs://container1/spring-weather.html",
        "version": "162215905222974206637545574128436022861",
        "record_locator": {
          "protocol": "abfs",
          "remote_file_path": "abfs://container1/"
        },
        "date_created": "1678441216.0",
        "date_modified": "1678441216.0"
      }
    }
  }
]

Import

# Load example fixture
import json
with open("example-docs/spring-weather.html.json") as f:
    elements = json.load(f)

I/O Contract

Inputs

Name Type Required Description
JSON file path str Yes Path to example-docs/spring-weather.html.json

Outputs

Name Type Description
elements List[Dict] JSON array of element dicts with type, element_id, text, metadata
metadata.data_source Dict Azure Blob Storage provenance (abfs protocol)
metadata.filetype str "text/html"
metadata.languages List[str] ["eng"]

Usage Examples

Exploring Element Types

import json
from collections import Counter

with open("example-docs/spring-weather.html.json") as f:
    elements = json.load(f)

# Count element types
counts = Counter(e["type"] for e in elements)
for element_type, count in counts.most_common():
    print(f"{element_type}: {count}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment