Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Unstructured IO Unstructured Element Init

From Leeroopedia
Knowledge Sources
Domains Document_Processing, Data_Modeling
Last Updated 2026-02-12 00:00 GMT

Overview

Concrete tool for constructing document element objects provided by the Unstructured library.

Description

The Element abstract base class and its subclasses (Text, NarrativeText, Title, Table, Image, etc.) define the core data model for document content. The __init__ method initializes an element with a unique ID, optional spatial coordinates, metadata, and detection origin. The Text.apply method enables post-processing by applying cleaner functions to element text content.

Usage

Import Element subclasses when constructing elements manually (e.g., in custom partitioners or tests), when type-checking elements from partition output, or when applying text cleaning operations via Text.apply.

Code Reference

Source Location

  • Repository: unstructured
  • File: unstructured/documents/elements.py
  • Lines: 662-860

Signature

class Element(abc.ABC):
    def __init__(
        self,
        element_id: Optional[str] = None,
        coordinates: Optional[tuple[tuple[float, float], ...]] = None,
        coordinate_system: Optional[CoordinateSystem] = None,
        metadata: Optional[ElementMetadata] = None,
        detection_origin: Optional[str] = None,
    ):
        """Initialize a document element.

        Args:
            element_id: Unique identifier (auto-generated UUID if None).
            coordinates: Bounding box coordinates as tuple of (x, y) points.
            coordinate_system: Coordinate system for the bounding box.
            metadata: Rich metadata container (ElementMetadata).
            detection_origin: Origin of this element detection (e.g., model name).
        """

class Text(Element):
    def __init__(
        self,
        text: str,
        *args,
        **kwargs,
    ):
        """Initialize a text-bearing element.

        Args:
            text: The text content of this element.
        """

    def apply(self, *cleaners: Callable[[str], str]):
        """Apply cleaner functions to the element's text content.

        Args:
            cleaners: One or more functions that take a string and return a cleaned string.
        """

Import

from unstructured.documents.elements import (
    Element,
    Text,
    NarrativeText,
    Title,
    Table,
    Image,
    ListItem,
    Header,
    Footer,
    FigureCaption,
    CompositeElement,
    ElementMetadata,
)

I/O Contract

Inputs (Element.__init__)

Name Type Required Description
element_id None No Unique identifier (auto-generated if None)
coordinates None No Bounding box as tuple of (x, y) points
coordinate_system None No Coordinate reference system
metadata None No Rich metadata container
detection_origin None No Source of element detection

Inputs (Text.apply)

Name Type Required Description
cleaners Callable[[str], str] Yes One or more text cleaning functions

Outputs

Name Type Description
Element instance Element subclass Constructed element with ID, metadata, and optional coordinates
apply (side effect) None Modifies element text in-place

Usage Examples

Create Elements Manually

from unstructured.documents.elements import NarrativeText, Title, ElementMetadata

title = Title(
    text="Introduction",
    metadata=ElementMetadata(page_number=1, filename="report.pdf"),
)

paragraph = NarrativeText(
    text="This report summarizes the findings of our analysis.",
    metadata=ElementMetadata(page_number=1, filename="report.pdf"),
)

Apply Text Cleaners

from unstructured.documents.elements import NarrativeText

element = NarrativeText(text="  Extra   whitespace   here  ")

# Apply a simple whitespace normalizer
element.apply(lambda s: " ".join(s.split()))
print(str(element))  # "Extra whitespace here"

Type-Check Partition Output

from unstructured.partition.auto import partition
from unstructured.documents.elements import Title, Table

elements = partition(filename="report.pdf")

titles = [el for el in elements if isinstance(el, Title)]
tables = [el for el in elements if isinstance(el, Table)]

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment