Overview
Concrete tool for validating ingest pipeline output against expected baselines using diff comparison.
Description
The check-diff-expected-output.sh and check-num-files-output.sh scripts provide the validation layer for the ingest pipeline integration tests. The diff script compares every file in the actual output directory against corresponding files in the expected output directory, reporting unified diffs for any mismatches. The file count script verifies that the correct number of output files were produced.
Usage
Run these scripts after executing a connector integration test to verify that output matches the baseline. They are called automatically by the test orchestrators (test-ingest-src.sh, test-ingest-dest.sh) but can also be run standalone for debugging.
Code Reference
Source Location
- Repository: unstructured
- File: test_unstructured_ingest/check-diff-expected-output.sh (lines 1-69)
- File: test_unstructured_ingest/check-num-files-output.sh (lines 1-24)
Signature
# Content diff validation
./test_unstructured_ingest/check-diff-expected-output.sh <OUTPUT_FOLDER_NAME>
# File count validation
./test_unstructured_ingest/check-num-files-output.sh <EXPECTED_COUNT> <OUTPUT_FOLDER_NAME>
Import
# Scripts are part of the repository, no installation needed
# Dependencies: diff, diffstat, find, wc, bash
I/O Contract
Inputs (check-diff-expected-output.sh)
| Name |
Type |
Required |
Description
|
| $1 |
string |
Yes |
Output folder name (e.g., "s3", "azure", "local")
|
| OVERWRITE_FIXTURES |
env var |
No |
Set to "true" to update baselines instead of comparing
|
| OUTPUT_ROOT |
env var |
No |
Root output directory override
|
Inputs (check-num-files-output.sh)
| Name |
Type |
Required |
Description
|
| $1 |
int |
Yes |
Expected number of output files
|
| $2 |
string |
Yes |
Output folder name
|
Outputs
| Name |
Type |
Description
|
| exit code |
int |
0 if outputs match baseline, 1 if mismatch detected
|
| stdout |
text |
Unified diff report and diffstat summary on mismatch
|
Usage Examples
Validate S3 Connector Output
# Run S3 connector test
./test_unstructured_ingest/src/s3.sh
# Validate output against baseline
./test_unstructured_ingest/check-diff-expected-output.sh s3
# Verify file count
./test_unstructured_ingest/check-num-files-output.sh 5 s3
Update Baselines After Intentional Change
# Run the connector test
./test_unstructured_ingest/src/local.sh
# Update expected output baselines
OVERWRITE_FIXTURES=true ./test_unstructured_ingest/check-diff-expected-output.sh local
Related Pages
Implements Principle
Page Connections
Double-click a node to navigate. Hold to expand connections.