Implementation:Unstructured IO Unstructured CProfile Partition
| Knowledge Sources | |
|---|---|
| Domains | Performance, Profiling |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
Concrete tool for time-profiling the partition pipeline using cProfile and flamegraph visualization.
Description
The profiling script uses Python's cProfile module in cumulative sort mode to profile the run_partition.py script, which calls partition() with a specified strategy. The output is a binary .prof file that can be visualized with flameprof (SVG flamegraph) or snakeviz (interactive HTML).
Usage
Run this when you need to identify CPU-time bottlenecks in the partition pipeline. The profiling script is interactive and guides you through document and strategy selection.
Code Reference
Source Location
- Repository: unstructured
- File: scripts/performance/profile.sh (line 332)
- File: scripts/performance/run_partition.py (lines 6-19)
Signature
# cProfile command (from profile.sh line 332)
python3 -m cProfile -s cumulative \
-o "$PROFILE_RESULTS_DIR/${test_file##*/}.prof" \
-m "scripts.performance.run_partition" "$test_file" "$strategy"
# Visualization (from profile.sh lines 198-204)
flameprof "$PROFILE_RESULTS_DIR/${test_file##*/}.prof" \
> "$PROFILE_RESULTS_DIR/${test_file##*/}.flameprof.svg"
snakeviz "$PROFILE_RESULTS_DIR/${test_file##*/}.prof"
# run_partition.py - the profiled target
import sys, os
from unstructured.partition.auto import partition
if __name__ == "__main__":
file_path = sys.argv[1]
strategy = sys.argv[2]
model_name = sys.argv[3] if len(sys.argv) > 3 else os.environ.get("PARTITION_MODEL_NAME")
result = partition(file_path, strategy=strategy, model_name=model_name)
Import
pip install flameprof snakeviz
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| test_file | path | Yes | Document file to profile |
| strategy | string | Yes | Partition strategy (auto, fast, hi_res, ocr_only) |
| model_name | string | No | Layout detection model (from argv[3] or PARTITION_MODEL_NAME env) |
Outputs
| Name | Type | Description |
|---|---|---|
| .prof file | binary | cProfile binary profile data in profile_results/ |
| .flameprof.svg | SVG | Flamegraph visualization (via flameprof) |
| interactive viewer | HTML | snakeviz web UI for exploring the profile |
Usage Examples
Run the Interactive Profiler
cd /path/to/unstructured
./scripts/performance/profile.sh
# Interactive prompts:
# 1. Select document (from scripts/performance/docs/ or custom path)
# 2. Select strategy (auto/fast/hi_res/ocr_only)
# 3. View results in profile_results/
Profile Directly with cProfile
python3 -m cProfile -s cumulative \
-o ./profile_results/report.pdf.prof \
-m scripts.performance.run_partition \
./documents/report.pdf hi_res
# Generate flamegraph
flameprof ./profile_results/report.pdf.prof > ./profile_results/report.pdf.svg