Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Unstructured IO Unstructured Check Diff Expected Output

From Leeroopedia
Knowledge Sources
Domains Testing, Quality_Assurance, CI_CD
Last Updated 2026-02-12 00:00 GMT

Overview

Concrete tool for validating ingest pipeline output against expected baselines using diff comparison.

Description

The check-diff-expected-output.sh and check-num-files-output.sh scripts provide the validation layer for the ingest pipeline integration tests. The diff script compares every file in the actual output directory against corresponding files in the expected output directory, reporting unified diffs for any mismatches. The file count script verifies that the correct number of output files were produced.

Usage

Run these scripts after executing a connector integration test to verify that output matches the baseline. They are called automatically by the test orchestrators (test-ingest-src.sh, test-ingest-dest.sh) but can also be run standalone for debugging.

Code Reference

Source Location

  • Repository: unstructured
  • File: test_unstructured_ingest/check-diff-expected-output.sh (lines 1-69)
  • File: test_unstructured_ingest/check-num-files-output.sh (lines 1-24)

Signature

# Content diff validation
./test_unstructured_ingest/check-diff-expected-output.sh <OUTPUT_FOLDER_NAME>

# File count validation
./test_unstructured_ingest/check-num-files-output.sh <EXPECTED_COUNT> <OUTPUT_FOLDER_NAME>

Import

# Scripts are part of the repository, no installation needed
# Dependencies: diff, diffstat, find, wc, bash

I/O Contract

Inputs (check-diff-expected-output.sh)

Name Type Required Description
$1 string Yes Output folder name (e.g., "s3", "azure", "local")
OVERWRITE_FIXTURES env var No Set to "true" to update baselines instead of comparing
OUTPUT_ROOT env var No Root output directory override

Inputs (check-num-files-output.sh)

Name Type Required Description
$1 int Yes Expected number of output files
$2 string Yes Output folder name

Outputs

Name Type Description
exit code int 0 if outputs match baseline, 1 if mismatch detected
stdout text Unified diff report and diffstat summary on mismatch

Usage Examples

Validate S3 Connector Output

# Run S3 connector test
./test_unstructured_ingest/src/s3.sh

# Validate output against baseline
./test_unstructured_ingest/check-diff-expected-output.sh s3

# Verify file count
./test_unstructured_ingest/check-num-files-output.sh 5 s3

Update Baselines After Intentional Change

# Run the connector test
./test_unstructured_ingest/src/local.sh

# Update expected output baselines
OVERWRITE_FIXTURES=true ./test_unstructured_ingest/check-diff-expected-output.sh local

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment