Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Unstructured IO Unstructured Profiling Tools

From Leeroopedia
Knowledge Sources
Domains Performance Profiling
Last Updated 2026-02-12 09:00 GMT

Overview

The Profiling_Tools environment provides memory and CPU profiling utilities for benchmarking and optimizing the performance of unstructured's partitioning functions.

Description

This environment aggregates several profiling tools used to measure and visualize the performance characteristics of document partitioning. The primary tools are memray for memory profiling and cProfile (Python stdlib) for CPU profiling, with visualization provided by flameprof, snakeviz, and optionally speedscope.

The profile.sh script (lines 29-38) validates that memray and flameprof are available at runtime before proceeding with profiling runs. A critical platform limitation is documented at line 26: memray does not build wheels for ARM-Linux, which means profiling cannot be run inside ARM Docker containers on Apple M1 Macs.

The time_partition.py utility provides a lightweight alternative that only requires the core unstructured package, using Python's built-in cProfile module without any additional profiling dependencies.

Optionally, Docker can be used for containerized profiling, controlled by the DOCKER_TEST environment variable. The py-spy sampling profiler can produce output viewable in speedscope (an npm package).

Usage

Use this environment when you need to identify memory leaks, CPU bottlenecks, or overall performance regressions in the partitioning pipeline. It is primarily intended for development and benchmarking, not for production deployments.

System Requirements

Category Requirement Notes
Python >= 3.11, < 3.14 Required Python version range
OS Linux x86_64 (recommended) memray does not support ARM-Linux
Architecture x86_64 only for memray ARM-Linux wheels are not available
Docker Optional Required if using DOCKER_TEST for containerized profiling
Node.js/npm Optional Required only for speedscope visualization

Dependencies

System Packages

  • docker -- optional, for containerized profiling (controlled by DOCKER_TEST env var)
  • npm -- optional, for installing speedscope visualization tool

Python Packages

  • memray >= 1.7.0 -- memory profiler with flame graph output (from scripts/performance/requirements.txt)
  • flameprof >= 0.4 -- converts cProfile output to flame graph SVG (from scripts/performance/requirements.txt)
  • snakeviz >= 2.2.0 -- interactive browser-based cProfile visualizer (from scripts/performance/requirements.txt)
  • py-spy >= 0.3.14 -- sampling profiler for Python (from scripts/performance/requirements.txt)
  • cProfile -- Python standard library module (no installation needed)

Optional npm Packages

  • speedscope -- web-based viewer for py-spy profiling output

Credentials

  • DOCKER_TEST -- set to enable containerized profiling via Docker

Quick Install

# Install Python profiling tools
pip install -r scripts/performance/requirements.txt

# Or install individually
pip install "memray>=1.7.0" "flameprof>=0.4" "snakeviz>=2.2.0" "py-spy>=0.3.14"

# Optional: install speedscope for py-spy visualization
npm install -g speedscope

Code Evidence

Runtime validation of profiling tools (profile.sh:29-38):

# Validate that memray is installed
if ! command -v memray &> /dev/null; then
    echo "memray is not installed. Install with: pip install memray"
    exit 1
fi

# Validate that flameprof is installed
if ! command -v flameprof &> /dev/null; then
    echo "flameprof is not installed. Install with: pip install flameprof"
    exit 1
fi

ARM-Linux limitation note (profile.sh:26):

# NOTE: memray does not build wheels for ARM-Linux.
# Cannot run in ARM Docker on M1 Mac.

Lightweight timing with cProfile (time_partition.py):

import cProfile
# cProfile is Python stdlib - no extra install needed
cProfile.run('partition(filename=input_file)', output_file)

Common Errors

Error Message Cause Solution
memray is not installed memray package not found in the environment Install via pip install memray
flameprof is not installed flameprof package not found in the environment Install via pip install flameprof
ERROR: Could not build wheels for memray Attempting to install memray on ARM-Linux architecture Use an x86_64 machine or VM; memray does not support ARM-Linux
command not found: speedscope speedscope npm package not installed globally Install via npm install -g speedscope
PermissionError: [Errno 1] Operation not permitted py-spy requires elevated privileges for process attachment Run with sudo or use --pid mode with appropriate permissions

Compatibility Notes

  • memray does not build wheels for ARM-Linux, so profiling cannot be performed in ARM Docker containers (e.g., running on Apple M1/M2 Macs with Docker)
  • cProfile is part of the Python standard library and requires no additional installation; use time_partition.py for lightweight benchmarking without external dependencies
  • py-spy is a sampling profiler that attaches to running processes and may require root privileges on some systems
  • snakeviz launches a local web server for interactive exploration of cProfile output
  • For production performance monitoring, consider application-level metrics rather than these development-focused profiling tools

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment