Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Unstructured IO Unstructured CProfile Partition

From Leeroopedia
Revision as of 11:54, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Unstructured_IO_Unstructured_CProfile_Partition.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Performance, Profiling
Last Updated 2026-02-12 00:00 GMT

Overview

Concrete tool for time-profiling the partition pipeline using cProfile and flamegraph visualization.

Description

The profiling script uses Python's cProfile module in cumulative sort mode to profile the run_partition.py script, which calls partition() with a specified strategy. The output is a binary .prof file that can be visualized with flameprof (SVG flamegraph) or snakeviz (interactive HTML).

Usage

Run this when you need to identify CPU-time bottlenecks in the partition pipeline. The profiling script is interactive and guides you through document and strategy selection.

Code Reference

Source Location

  • Repository: unstructured
  • File: scripts/performance/profile.sh (line 332)
  • File: scripts/performance/run_partition.py (lines 6-19)

Signature

# cProfile command (from profile.sh line 332)
python3 -m cProfile -s cumulative \
    -o "$PROFILE_RESULTS_DIR/${test_file##*/}.prof" \
    -m "scripts.performance.run_partition" "$test_file" "$strategy"

# Visualization (from profile.sh lines 198-204)
flameprof "$PROFILE_RESULTS_DIR/${test_file##*/}.prof" \
    > "$PROFILE_RESULTS_DIR/${test_file##*/}.flameprof.svg"

snakeviz "$PROFILE_RESULTS_DIR/${test_file##*/}.prof"
# run_partition.py - the profiled target
import sys, os
from unstructured.partition.auto import partition

if __name__ == "__main__":
    file_path = sys.argv[1]
    strategy = sys.argv[2]
    model_name = sys.argv[3] if len(sys.argv) > 3 else os.environ.get("PARTITION_MODEL_NAME")
    result = partition(file_path, strategy=strategy, model_name=model_name)

Import

pip install flameprof snakeviz

I/O Contract

Inputs

Name Type Required Description
test_file path Yes Document file to profile
strategy string Yes Partition strategy (auto, fast, hi_res, ocr_only)
model_name string No Layout detection model (from argv[3] or PARTITION_MODEL_NAME env)

Outputs

Name Type Description
.prof file binary cProfile binary profile data in profile_results/
.flameprof.svg SVG Flamegraph visualization (via flameprof)
interactive viewer HTML snakeviz web UI for exploring the profile

Usage Examples

Run the Interactive Profiler

cd /path/to/unstructured
./scripts/performance/profile.sh

# Interactive prompts:
# 1. Select document (from scripts/performance/docs/ or custom path)
# 2. Select strategy (auto/fast/hi_res/ocr_only)
# 3. View results in profile_results/

Profile Directly with cProfile

python3 -m cProfile -s cumulative \
    -o ./profile_results/report.pdf.prof \
    -m scripts.performance.run_partition \
    ./documents/report.pdf hi_res

# Generate flamegraph
flameprof ./profile_results/report.pdf.prof > ./profile_results/report.pdf.svg

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment