Implementation:Unstructured IO Unstructured Unstructured Ingest CLI Processing
| Knowledge Sources | |
|---|---|
| Domains | Data_Ingestion, ETL, CLI |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
Concrete tool for configuring document processing parameters in the unstructured-ingest CLI pipeline.
Description
The unstructured-ingest CLI provides processing configuration flags that control how documents are partitioned within the pipeline. These flags are demonstrated in the local filesystem test script and apply to all source connectors. Key flags include strategy selection, parallel worker count, metadata exclusion, file glob filtering, and reprocessing control.
Usage
Use these CLI flags when you need to tune the ingest pipeline's processing behavior. The processing flags sit between the source connector configuration and the destination configuration in the CLI command.
Code Reference
Source Location
- Repository: unstructured
- File: test_unstructured_ingest/src/local.sh
- Lines: 23-34
Signature
unstructured-ingest local \
--num-processes <N> \
--metadata-exclude <CSV_FIELDS> \
--strategy <STRATEGY> \
--reprocess \
--verbose \
--file-glob <PATTERN> \
--input-path <DIR> \
--work-dir <DIR> \
local --output-dir <DIR>
Import
pip install unstructured-ingest
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --strategy | string | No | Partition strategy: auto, fast, hi_res, ocr_only |
| --num-processes | int | No | Parallel workers (default: os.cpu_count()) |
| --metadata-exclude | CSV string | No | Metadata fields to exclude (e.g., "filename,file_directory") |
| --file-glob | string | No | File pattern filter (e.g., "*.html", "*.pdf") |
| --reprocess | flag | No | Force reprocessing of already-processed files |
| --verbose | flag | No | Enable detailed logging |
| --input-path | path | Yes | Directory containing input documents |
| --work-dir | path | No | Temporary directory for intermediate files |
Outputs
| Name | Type | Description |
|---|---|---|
| JSON files | files | Partitioned element JSON files in --output-dir |
Usage Examples
Local Processing with Filtering
unstructured-ingest local \
--input-path ./documents/ \
--file-glob "*.html" \
--num-processes 4 \
--strategy fast \
--metadata-exclude "filename,file_directory" \
--reprocess \
--verbose \
--work-dir /tmp/unstructured-work \
local --output-dir ./structured-output/local/
High-Resolution Processing
unstructured-ingest local \
--input-path ./scanned-pdfs/ \
--strategy hi_res \
--num-processes 2 \
local --output-dir ./structured-output/hi-res/