Implementation:Unstructured IO Unstructured Chunk By Title

Knowledge Sources	Unstructured
Domains	Document_Processing, RAG, Text_Splitting
Last Updated	2026-02-12 00:00 GMT

Overview

Concrete tool for section-aware chunking of document elements at title boundaries provided by the Unstructured library.

Description

The chunk_by_title function implements section-aware chunking. It starts new chunks when a Title element is encountered, respecting both structural boundaries and size constraints. It supports merging undersized sections, controlling page-spanning behavior, and all the same size/overlap parameters as basic chunking.

Usage

Import this function when you need chunks that respect document section structure. This is the recommended chunking strategy for RAG pipelines processing structured documents like reports, papers, and manuals where topical coherence matters for retrieval quality.

Code Reference

Source Location

Repository: unstructured
File: unstructured/chunking/title.py
Lines: 23-99

Signature

def chunk_by_title(
    elements: Iterable[Element],
    *,
    combine_text_under_n_chars: Optional[int] = None,
    include_orig_elements: Optional[bool] = None,
    max_characters: Optional[int] = None,
    max_tokens: Optional[int] = None,
    multipage_sections: Optional[bool] = None,
    new_after_n_chars: Optional[int] = None,
    new_after_n_tokens: Optional[int] = None,
    overlap: Optional[int] = None,
    overlap_all: Optional[bool] = None,
    tokenizer: Optional[str] = None,
) -> list[Element]:
    """Chunk elements at title boundaries with size constraints.

    Args:
        elements: Iterable of Element objects to chunk.
        combine_text_under_n_chars: Merge sections smaller than this threshold.
        include_orig_elements: Preserve original elements in chunk metadata.
        max_characters: Hard maximum chunk size in characters (default 500).
        max_tokens: Hard maximum chunk size in tokens.
        multipage_sections: Allow chunks to span page boundaries (default True).
        new_after_n_chars: Soft max to trigger new chunk.
        new_after_n_tokens: Soft max in tokens.
        overlap: Character overlap between consecutive chunks.
        overlap_all: Apply overlap to all chunks.
        tokenizer: Tokenizer name for token-based chunking.
    Returns:
        List of chunked elements respecting section boundaries.
    """

Import

from unstructured.chunking.title import chunk_by_title

I/O Contract

Inputs

Name	Type	Required	Description
elements	Iterable[Element]	Yes	Elements from partitioning
max_characters	None	No	Hard max chunk size (default 500)
new_after_n_chars	None	No	Soft max to start new chunk
combine_text_under_n_chars	None	No	Merge small sections below this threshold
multipage_sections	None	No	Allow cross-page chunks (default True)
overlap	None	No	Character overlap between chunks
include_orig_elements	None	No	Store original elements in metadata

Outputs

Name	Type	Description
return	list[Element]	Chunked elements aligned to section boundaries: CompositeElement for text, TableChunk for split tables

Usage Examples

Section-Aware Chunking for RAG

from unstructured.partition.auto import partition
from unstructured.chunking.title import chunk_by_title

elements = partition(filename="annual_report.pdf", strategy="hi_res")

chunks = chunk_by_title(
    elements,
    max_characters=1500,
    new_after_n_chars=1200,
    combine_text_under_n_chars=200,
    overlap=100,
)

for chunk in chunks:
    print(f"Length: {len(str(chunk))}, Text: {str(chunk)[:60]}...")

Page-Aligned Chunks

from unstructured.chunking.title import chunk_by_title

# Force chunks to not span page boundaries
chunks = chunk_by_title(
    elements,
    max_characters=1000,
    multipage_sections=False,
)

Via Dispatch Function

from unstructured.chunking.dispatch import chunk

chunks = chunk(
    elements,
    chunking_strategy="by_title",
    max_characters=1000,
    combine_text_under_n_chars=200,
    include_orig_elements=True,
)

Related Pages

Implements Principle

Principle:Unstructured_IO_Unstructured_Section_Aware_Chunking

Uses Heuristic

Heuristic:Unstructured_IO_Unstructured_Chunk_Size_Tuning

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment