Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Unstructured IO Unstructured Libmagic

From Leeroopedia
Knowledge Sources
Domains File Type Detection
Last Updated 2026-02-12 09:00 GMT

Overview

The Libmagic environment provides MIME type detection capabilities via the python-magic wrapper around the libmagic C library, enabling accurate identification of textual file types during partitioning.

Description

The python-magic package serves as the Python binding for the system-level libmagic library, which performs MIME type detection by inspecting file content signatures (magic bytes). In filetype.py (lines 56-60), the module attempts a dynamic import via importlib.import_module("magic") and falls back to the filetype package if libmagic is unavailable. Without libmagic installed, textual file types -- including CSV, EML, HTML, MD, RST, RTF, TSV, and TXT -- cannot be reliably detected, as these formats lack distinct binary magic byte signatures.

When libmagic is unavailable, a warning is emitted (filetype.py, lines 435-438) via the logger, alerting users that file type detection will be degraded. Additionally, a known issue exists where older versions of libmagic shipped on certain Docker images may misclassify .json files as text/plain (filetype.py, lines 638-640).

Usage

This environment is required whenever the Detect_Filetype implementation is invoked to determine the MIME type of an input document. It is a core prerequisite for the automatic partitioning pipeline, which routes documents to the correct parser based on detected file type.

System Requirements

Category Requirement Notes
Python >= 3.11, < 3.14 Required Python version range
OS Linux (Ubuntu/Debian recommended) Also works on macOS and Windows with appropriate libmagic installation
C Library libmagic1 Must be installed at the system level; python-magic is only a wrapper

Dependencies

System Packages

  • libmagic1 -- the core C library for MIME type detection (install via apt-get install libmagic1 on Ubuntu/Debian)
  • libmagic-dev -- development headers (may be needed for building python-magic from source)

Python Packages

  • python-magic >= 0.4.27, < 1.0.0 -- Python wrapper for libmagic (declared in pyproject.toml)

Credentials

No credentials or API keys are required for this environment.

Quick Install

# Install the system-level libmagic library (Ubuntu/Debian)
sudo apt-get update && sudo apt-get install -y libmagic1

# Install the Python wrapper
pip install "python-magic>=0.4.27,<1.0.0"

Code Evidence

Dynamic import with fallback (filetype.py:56-60):

try:
    magic = importlib.import_module("magic")
except ImportError:
    magic = None

Warning when libmagic is unavailable (filetype.py:435-438):

logger.warning(
    "libmagic is unavailable but assists in filetype detection. "
    "Please consider installing libmagic for improved results."
)

Known issue with older libmagic on Docker (filetype.py:638-640):

# NOTE: older libmagic on Docker image may misclassify .json as text/plain

Common Errors

Error Message Cause Solution
ImportError: failed to find libmagic The libmagic C library is not installed on the system Install via sudo apt-get install libmagic1 (Ubuntu/Debian) or brew install libmagic (macOS)
WARNING: libmagic is unavailable but assists in filetype detection python-magic package not installed or libmagic C library missing Install both the system library and the Python package: pip install python-magic
JSON files detected as text/plain instead of application/json Older version of libmagic shipped in Docker image Update the Docker base image or install a newer version of libmagic

Compatibility Notes

  • The filetype package is used as a fallback when libmagic is unavailable, but it cannot detect textual file types (CSV, EML, HTML, MD, RST, RTF, TSV, TXT)
  • On macOS, install libmagic via Homebrew: brew install libmagic
  • On Windows, additional configuration may be needed; consider using python-magic-bin as an alternative
  • Docker images should use a recent base to avoid the .json misclassification bug

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment