Implementation:Unstructured IO Unstructured Element Init

Knowledge Sources	Unstructured
Domains	Document_Processing, Data_Modeling
Last Updated	2026-02-12 00:00 GMT

Overview

Concrete tool for constructing document element objects provided by the Unstructured library.

Description

The Element abstract base class and its subclasses (Text, NarrativeText, Title, Table, Image, etc.) define the core data model for document content. The __init__ method initializes an element with a unique ID, optional spatial coordinates, metadata, and detection origin. The Text.apply method enables post-processing by applying cleaner functions to element text content.

Usage

Import Element subclasses when constructing elements manually (e.g., in custom partitioners or tests), when type-checking elements from partition output, or when applying text cleaning operations via Text.apply.

Code Reference

Source Location

Repository: unstructured
File: unstructured/documents/elements.py
Lines: 662-860

Signature

class Element(abc.ABC):
    def __init__(
        self,
        element_id: Optional[str] = None,
        coordinates: Optional[tuple[tuple[float, float], ...]] = None,
        coordinate_system: Optional[CoordinateSystem] = None,
        metadata: Optional[ElementMetadata] = None,
        detection_origin: Optional[str] = None,
    ):
        """Initialize a document element.

        Args:
            element_id: Unique identifier (auto-generated UUID if None).
            coordinates: Bounding box coordinates as tuple of (x, y) points.
            coordinate_system: Coordinate system for the bounding box.
            metadata: Rich metadata container (ElementMetadata).
            detection_origin: Origin of this element detection (e.g., model name).
        """

class Text(Element):
    def __init__(
        self,
        text: str,
        *args,
        **kwargs,
    ):
        """Initialize a text-bearing element.

        Args:
            text: The text content of this element.
        """

    def apply(self, *cleaners: Callable[[str], str]):
        """Apply cleaner functions to the element's text content.

        Args:
            cleaners: One or more functions that take a string and return a cleaned string.
        """

Import

from unstructured.documents.elements import (
    Element,
    Text,
    NarrativeText,
    Title,
    Table,
    Image,
    ListItem,
    Header,
    Footer,
    FigureCaption,
    CompositeElement,
    ElementMetadata,
)

I/O Contract

Inputs (Element.init)

Name	Type	Required	Description
element_id	None	No	Unique identifier (auto-generated if None)
coordinates	None	No	Bounding box as tuple of (x, y) points
coordinate_system	None	No	Coordinate reference system
metadata	None	No	Rich metadata container
detection_origin	None	No	Source of element detection

Inputs (Text.apply)

Name	Type	Required	Description
cleaners	Callable[[str], str]	Yes	One or more text cleaning functions

Outputs

Name	Type	Description
Element instance	Element subclass	Constructed element with ID, metadata, and optional coordinates
apply (side effect)	None	Modifies element text in-place

Usage Examples

Create Elements Manually

from unstructured.documents.elements import NarrativeText, Title, ElementMetadata

title = Title(
    text="Introduction",
    metadata=ElementMetadata(page_number=1, filename="report.pdf"),
)

paragraph = NarrativeText(
    text="This report summarizes the findings of our analysis.",
    metadata=ElementMetadata(page_number=1, filename="report.pdf"),
)

Apply Text Cleaners

from unstructured.documents.elements import NarrativeText

element = NarrativeText(text="  Extra   whitespace   here  ")

# Apply a simple whitespace normalizer
element.apply(lambda s: " ".join(s.split()))
print(str(element))  # "Extra whitespace here"

Type-Check Partition Output

from unstructured.partition.auto import partition
from unstructured.documents.elements import Title, Table

elements = partition(filename="report.pdf")

titles = [el for el in elements if isinstance(el, Title)]
tables = [el for el in elements if isinstance(el, Table)]

Related Pages

Implements Principle

Principle:Unstructured_IO_Unstructured_Element_Model

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment