Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Heuristic:Unstructured IO Unstructured Multi Python Matrix

From Leeroopedia
Revision as of 10:46, 16 February 2026 by Admin (talk | contribs) (Auto-imported from heuristics/Unstructured_IO_Unstructured_Multi_Python_Matrix.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains CI_CD, Testing, Compatibility
Last Updated 2026-02-12 09:30 GMT

Overview

Testing across multiple Python versions (3.11, 3.12, 3.13) using a CI matrix strategy to catch version-specific regressions.

Description

The Unstructured CI workflow uses a matrix strategy in GitHub Actions to run the test suite against Python 3.11, 3.12, and 3.13. The setup job caches dependencies for all three versions, and the test_unit job runs the full test suite against each. This ensures that the library works correctly across its entire supported Python version range (>=3.11, <3.14 as declared in pyproject.toml).

Additionally, the test_unit_dependency_extras job runs a matrix of per-extra isolation tests (csv, docx, odt, markdown, pypandoc, pdf-image, pptx, xlsx), verifying that each optional dependency group functions correctly when installed individually.

Usage

Apply this heuristic whenever:

  • Adding code that uses Python version-specific features (match statements, exception groups, type union syntax).
  • Modifying dependencies that may have different behavior across Python versions.
  • Evaluating whether to adopt a new Python version or drop support for an older one.

The Insight (Rule of Thumb)

  • Action: Run the full test suite across all supported Python versions (3.11, 3.12, 3.13) in CI using a matrix strategy. Also test each optional extra in isolation.
  • Value: Catches version-specific regressions before they reach users. The per-extra matrix catches hidden cross-dependencies between optional packages.
  • Trade-off: Matrix testing multiplies CI runtime (3x for Python versions, 8x for extras). Dependency caching in the setup job mitigates installation overhead but adds cache invalidation complexity.

Reasoning

Python minor versions introduce behavioral changes in the standard library, type system, and import machinery. Libraries like Unstructured that support multiple Python versions must verify compatibility across all of them. The per-extra isolation matrix is particularly important because optional dependencies (e.g., pdf-image, docx) may introduce transitive dependencies that conflict across Python versions or with each other. Testing them independently reveals such conflicts.

Code Evidence

Multi-version setup job (ci.yml):

jobs:
  setup:
    strategy:
      matrix:
        python-version: ["3.11", "3.12", "3.13"]
    steps:
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
      - name: Cache dependencies
        uses: actions/cache@v4

Per-extra isolation matrix (ci.yml):

  test_unit_dependency_extras:
    strategy:
      matrix:
        extra: [csv, docx, odt, markdown, pypandoc, pdf-image, pptx, xlsx]
    steps:
      - name: Install specific extra
        run: uv sync --frozen --extra ${{ matrix.extra }} --group test
      - name: Run extra-specific tests
        run: make test-extra-${{ matrix.extra }} CI=true

Python version constraint (pyproject.toml):

[project]
requires-python = ">=3.11,<3.14"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment