Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Unstructured IO Unstructured Strategy Fallback Chain

From Leeroopedia
Knowledge Sources
Domains Document Partitioning, Strategy Selection, Dependency Management
Last Updated 2026-02-12 09:00 GMT

Overview

When a preferred partitioning strategy is unavailable due to missing dependencies, the system automatically falls back through a defined chain of alternative strategies rather than failing outright.

Description

The Unstructured library supports multiple partitioning strategies (hi_res, ocr_only, fast) but each depends on optional packages that may not be installed. The fallback chain ensures graceful degradation:

  • If hi_res is requested but unstructured_inference is not installed, the system falls back to ocr_only, then to fast.
  • If ocr_only is requested but pytesseract is not installed, the system falls back to fast or hi_res depending on availability.
  • Each fallback emits a logger.warning message so the user knows the requested strategy was not honored.

Additionally, the pdf_infer_table_structure parameter is deprecated in favor of skip_infer_table_types. The decide_table_extraction() function (auto.py lines 323-328) contains backward-compatibility logic that translates the old parameter into the new one for PDF table extraction, preventing breakage for users who have not yet migrated.

Usage

Apply this heuristic whenever:

  • Deploying Unstructured in environments where optional dependencies (unstructured_inference, pytesseract) may or may not be present.
  • Configuring partitioning pipelines that must degrade gracefully rather than raise import errors.
  • Migrating from pdf_infer_table_structure to skip_infer_table_types and needing to understand the backward-compat layer.

The Insight (Rule of Thumb)

  • Action: Let the system choose the best available strategy via the fallback chain rather than hard-coding a strategy that may not be available.
  • Value: The fallback order is hi_res -> ocr_only -> fast. Each step requires fewer dependencies. The deprecated pdf_infer_table_structure is silently mapped to skip_infer_table_types.
  • Trade-off: Falling back to a simpler strategy reduces extraction quality (e.g., no layout detection in fast mode, no table structure inference). Users must monitor warning logs to detect unintended fallbacks in production.

Reasoning

Document processing pipelines are often deployed across diverse environments: local development machines, CI runners, Docker containers, and cloud functions. Not all environments have the same packages installed. By defining a deterministic fallback chain and logging each transition, the library avoids hard failures at runtime while keeping the operator informed. The deprecation bridge in decide_table_extraction() follows the same principle: preserve working behavior for existing users while nudging them toward the newer API.

Code Evidence

Strategy fallback when hi_res is unavailable (strategies.py):

# strategies.py - hi_res fallback chain
if strategy == PartitionStrategy.HI_RES:
    if not dependency_exists("unstructured_inference"):
        logger.warning(
            "unstructured_inference is not installed. Falling back to ocr_only strategy."
        )
        strategy = PartitionStrategy.OCR_ONLY

if strategy == PartitionStrategy.OCR_ONLY:
    if not dependency_exists("pytesseract"):
        logger.warning(
            "pytesseract is not installed. Falling back to fast strategy."
        )
        strategy = PartitionStrategy.FAST

Backward-compat for deprecated pdf_infer_table_structure (auto.py:323-328):

# auto.py:323-328 - decide_table_extraction()
def decide_table_extraction(pdf_infer_table_structure, skip_infer_table_types):
    """Backward-compat: translate deprecated param to the new one."""
    if pdf_infer_table_structure is not None:
        logger.warning("pdf_infer_table_structure is deprecated; use skip_infer_table_types")
        if not pdf_infer_table_structure:
            skip_infer_table_types = ["pdf"]
    return skip_infer_table_types

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment