Environment:Unstructured IO Unstructured Libmagic
| Knowledge Sources | |
|---|---|
| Domains | File Type Detection |
| Last Updated | 2026-02-12 09:00 GMT |
Overview
The Libmagic environment provides MIME type detection capabilities via the python-magic wrapper around the libmagic C library, enabling accurate identification of textual file types during partitioning.
Description
The python-magic package serves as the Python binding for the system-level libmagic library, which performs MIME type detection by inspecting file content signatures (magic bytes). In filetype.py (lines 56-60), the module attempts a dynamic import via importlib.import_module("magic") and falls back to the filetype package if libmagic is unavailable. Without libmagic installed, textual file types -- including CSV, EML, HTML, MD, RST, RTF, TSV, and TXT -- cannot be reliably detected, as these formats lack distinct binary magic byte signatures.
When libmagic is unavailable, a warning is emitted (filetype.py, lines 435-438) via the logger, alerting users that file type detection will be degraded. Additionally, a known issue exists where older versions of libmagic shipped on certain Docker images may misclassify .json files as text/plain (filetype.py, lines 638-640).
Usage
This environment is required whenever the Detect_Filetype implementation is invoked to determine the MIME type of an input document. It is a core prerequisite for the automatic partitioning pipeline, which routes documents to the correct parser based on detected file type.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Python | >= 3.11, < 3.14 | Required Python version range |
| OS | Linux (Ubuntu/Debian recommended) | Also works on macOS and Windows with appropriate libmagic installation |
| C Library | libmagic1 | Must be installed at the system level; python-magic is only a wrapper |
Dependencies
System Packages
- libmagic1 -- the core C library for MIME type detection (install via
apt-get install libmagic1on Ubuntu/Debian) - libmagic-dev -- development headers (may be needed for building python-magic from source)
Python Packages
- python-magic >= 0.4.27, < 1.0.0 -- Python wrapper for libmagic (declared in pyproject.toml)
Credentials
No credentials or API keys are required for this environment.
Quick Install
# Install the system-level libmagic library (Ubuntu/Debian)
sudo apt-get update && sudo apt-get install -y libmagic1
# Install the Python wrapper
pip install "python-magic>=0.4.27,<1.0.0"
Code Evidence
Dynamic import with fallback (filetype.py:56-60):
try:
magic = importlib.import_module("magic")
except ImportError:
magic = None
Warning when libmagic is unavailable (filetype.py:435-438):
logger.warning(
"libmagic is unavailable but assists in filetype detection. "
"Please consider installing libmagic for improved results."
)
Known issue with older libmagic on Docker (filetype.py:638-640):
# NOTE: older libmagic on Docker image may misclassify .json as text/plain
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
ImportError: failed to find libmagic |
The libmagic C library is not installed on the system | Install via sudo apt-get install libmagic1 (Ubuntu/Debian) or brew install libmagic (macOS)
|
WARNING: libmagic is unavailable but assists in filetype detection |
python-magic package not installed or libmagic C library missing | Install both the system library and the Python package: pip install python-magic
|
JSON files detected as text/plain instead of application/json |
Older version of libmagic shipped in Docker image | Update the Docker base image or install a newer version of libmagic |
Compatibility Notes
- The filetype package is used as a fallback when libmagic is unavailable, but it cannot detect textual file types (CSV, EML, HTML, MD, RST, RTF, TSV, TXT)
- On macOS, install libmagic via Homebrew:
brew install libmagic - On Windows, additional configuration may be needed; consider using python-magic-bin as an alternative
- Docker images should use a recent base to avoid the .json misclassification bug