Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Unstructured IO Unstructured Golden File Fixtures Collaboration

From Leeroopedia
Knowledge Sources
Domains Testing, Collaboration, Ingest
Last Updated 2026-02-12 09:30 GMT

Overview

Golden file test fixtures containing expected JSON element output for collaboration platform ingest connectors (Confluence, Jira, Notion, Salesforce).

Description

These JSON files represent the expected structured output from processing documents through collaboration and CRM platform ingest connectors. Each file is a JSON array of element objects conforming to the Unstructured element schema. They serve as regression baselines — the CI pipeline diffs actual output against these golden files.

The collaboration connectors covered include:

  • Confluence — Wiki pages from multiple spaces (6 files across MFS and testteamsp spaces)
  • Jira — Issue content from project boards (2 files)
  • Notion — Database and page content (2 files)
  • Salesforce — Campaign XML records (4 files)

Usage

These fixtures are consumed by the ingest test scripts (e.g., `test_unstructured_ingest/src/confluence-diff.sh`, `test_unstructured_ingest/src/salesforce.sh`) and the diff-checking utilities. Update them when intentional changes to parsing behavior occur.

Code Reference

Source Location

Files Covered

Connector File Lines
Confluence confluence-diff/MFS/1540126.json 341
Confluence confluence-diff/MFS/1605956.json 924
Confluence confluence-diff/MFS/229477.json 1058
Confluence confluence-diff/testteamsp/1605859.json 1058
Confluence confluence-diff/testteamsp/1605989.json 815
Confluence confluence-diff/testteamsp/1802252.json 815
Jira jira-diff/1/10000.json 464
Jira jira-diff/1/10001.json 310
Notion notion/b2a12157-721e-4207-b3b7-527762b782c2.json 356
Notion notion/c47a4566-4c7a-488b-ac2a-1292ee507fcb.json 631
Salesforce salesforce/Campaign/701Hu000001eX9EIAU.xml.json 702
Salesforce salesforce/Campaign/701Hu000001eX9FIAU.xml.json 702
Salesforce salesforce/Campaign/701Hu000001eX9GIAU.xml.json 702
Salesforce salesforce/Campaign/701Hu000001eX9HIAU.xml.json 702

Signature

[
  {
    "type": "NarrativeText",
    "element_id": "hex_string",
    "text": "Content from collaboration platform",
    "metadata": {
      "languages": ["eng"],
      "filetype": "application/xml",
      "data_source": {
        "url": "platform_specific_url",
        "record_locator": {
          "protocol": "confluence|jira|notion|salesforce"
        },
        "date_created": "unix_timestamp",
        "date_modified": "unix_timestamp"
      }
    }
  }
]

Import

# Not importable — consumed by test scripts
import json
with open("test_unstructured_ingest/expected-structured-output/confluence-diff/MFS/229477.json") as f:
    expected_elements = json.load(f)

I/O Contract

Inputs

Name Type Required Description
JSON file path str Yes Path to a golden file under expected-structured-output/

Outputs

Name Type Description
elements List[Dict] JSON array of element dicts with type, element_id, text, and metadata
metadata.data_source Dict Platform-specific provenance (URL, record locator, timestamps)
metadata.filetype str MIME type of the original document
metadata.languages List[str] Detected languages (ISO 639 codes)

Usage Examples

Comparing Actual Output Against Golden File

import json

# Load expected and actual output
with open("test_unstructured_ingest/expected-structured-output/confluence-diff/MFS/229477.json") as f:
    expected = json.load(f)

with open("/tmp/actual-output/229477.json") as f:
    actual = json.load(f)

# Compare element counts
assert len(actual) == len(expected), f"Element count mismatch: {len(actual)} vs {len(expected)}"

# Compare element types
for i, (exp, act) in enumerate(zip(expected, actual)):
    assert exp["type"] == act["type"], f"Element {i}: type mismatch"
    assert exp["text"] == act["text"], f"Element {i}: text mismatch"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment