Implementation:Unstructured IO Unstructured Memray Partition
| Knowledge Sources | |
|---|---|
| Domains | Performance, Profiling, Memory_Management |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
Concrete tool for memory-profiling the partition pipeline using memray with multiple visualization modes.
Description
The profiling script uses memray to record all memory allocations during a partition run, storing results as a binary .bin file. Five visualization modes are available: flamegraph (HTML), table (HTML), tree (CLI), summary (CLI), and stats (CLI). This provides comprehensive memory analysis for optimizing document processing pipelines.
Usage
Run this when partition operations consume too much memory, especially when processing large documents. The memory profile reveals which functions are the top allocators and where peak memory usage occurs.
Code Reference
Source Location
- Repository: unstructured
- File: scripts/performance/profile.sh (line 334 for recording, lines 101-178 for visualization)
Signature
# Record memory profile (profile.sh line 334)
python3 -m memray run \
-o "$PROFILE_RESULTS_DIR/${test_file##*/}.bin" \
-m "scripts.performance.run_partition" "$test_file" "$strategy"
# Visualization modes (profile.sh lines 161-178)
memray flamegraph -o "${file}.memray.html" "${file}.bin"
memray table -o "${file}.table.html" "${file}.bin"
memray tree "${file}.bin"
memray summary "${file}.bin"
memray stats "${file}.bin"
Import
pip install "memray>=1.7.0"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| test_file | path | Yes | Document file to profile |
| strategy | string | Yes | Partition strategy (auto, fast, hi_res, ocr_only) |
Outputs
| Name | Type | Description |
|---|---|---|
| .bin file | binary | Memray binary profile data |
| .memray.html | HTML | Memory flamegraph showing allocation call stacks |
| .table.html | HTML | Top memory allocators table |
| tree output | CLI text | Hierarchical allocation tree |
| summary output | CLI text | Aggregate memory statistics |
| stats output | CLI text | Detailed allocation statistics |
Usage Examples
Record and Visualize Memory Profile
# Record memory profile
python3 -m memray run \
-o ./profile_results/report.pdf.bin \
-m scripts.performance.run_partition \
./documents/report.pdf hi_res
# Generate flamegraph
memray flamegraph -o ./profile_results/report.pdf.memray.html \
./profile_results/report.pdf.bin
# View allocation table
memray table -o ./profile_results/report.pdf.table.html \
./profile_results/report.pdf.bin
# Quick summary
memray summary ./profile_results/report.pdf.bin
Via Interactive Profile Script
./scripts/performance/profile.sh
# Select document, strategy, then choose "Memory profiling" mode
# Results saved to profile_results/