Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Unstructured IO Unstructured Unstructured Ingest CLI Processing

From Leeroopedia
Knowledge Sources
Domains Data_Ingestion, ETL, CLI
Last Updated 2026-02-12 00:00 GMT

Overview

Concrete tool for configuring document processing parameters in the unstructured-ingest CLI pipeline.

Description

The unstructured-ingest CLI provides processing configuration flags that control how documents are partitioned within the pipeline. These flags are demonstrated in the local filesystem test script and apply to all source connectors. Key flags include strategy selection, parallel worker count, metadata exclusion, file glob filtering, and reprocessing control.

Usage

Use these CLI flags when you need to tune the ingest pipeline's processing behavior. The processing flags sit between the source connector configuration and the destination configuration in the CLI command.

Code Reference

Source Location

  • Repository: unstructured
  • File: test_unstructured_ingest/src/local.sh
  • Lines: 23-34

Signature

unstructured-ingest local \
    --num-processes <N> \
    --metadata-exclude <CSV_FIELDS> \
    --strategy <STRATEGY> \
    --reprocess \
    --verbose \
    --file-glob <PATTERN> \
    --input-path <DIR> \
    --work-dir <DIR> \
    local --output-dir <DIR>

Import

pip install unstructured-ingest

I/O Contract

Inputs

Name Type Required Description
--strategy string No Partition strategy: auto, fast, hi_res, ocr_only
--num-processes int No Parallel workers (default: os.cpu_count())
--metadata-exclude CSV string No Metadata fields to exclude (e.g., "filename,file_directory")
--file-glob string No File pattern filter (e.g., "*.html", "*.pdf")
--reprocess flag No Force reprocessing of already-processed files
--verbose flag No Enable detailed logging
--input-path path Yes Directory containing input documents
--work-dir path No Temporary directory for intermediate files

Outputs

Name Type Description
JSON files files Partitioned element JSON files in --output-dir

Usage Examples

Local Processing with Filtering

unstructured-ingest local \
    --input-path ./documents/ \
    --file-glob "*.html" \
    --num-processes 4 \
    --strategy fast \
    --metadata-exclude "filename,file_directory" \
    --reprocess \
    --verbose \
    --work-dir /tmp/unstructured-work \
    local --output-dir ./structured-output/local/

High-Resolution Processing

unstructured-ingest local \
    --input-path ./scanned-pdfs/ \
    --strategy hi_res \
    --num-processes 2 \
    local --output-dir ./structured-output/hi-res/

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment