Implementation:CrewAIInc CrewAI RAG DOCX Loader

Knowledge Sources	CrewAI
Domains	RAG, Data_Loading
Last Updated	2026-02-11 00:00 GMT

Overview

Loads and extracts text content from Microsoft Word DOCX files, supporting both local file paths and remote URLs.

Description

DOCXLoader extends BaseLoader to handle DOCX document processing. It requires the python-docx library, which is lazily imported at load time to provide a clear error message if not installed.

For URL sources, the loader downloads the DOCX file to a temporary file using requests with appropriate MIME type headers, processes it, and ensures cleanup of the temporary file via a try/finally block. For local files, it loads directly from the file path.

The _load_from_file() method uses python-docx to open the document, iterates through all paragraphs, filters out empty ones, and joins them with newline separators. The resulting LoaderResult includes metadata about the format, total paragraph count, and table count in the document.

Usage

Import DOCXLoader when you need to explicitly load DOCX files. It is typically instantiated automatically by the DataType.DOCX registry when .docx files are detected.

Code Reference

Source Location

Repository: CrewAI
File: lib/crewai-tools/src/crewai_tools/rag/loaders/docx_loader.py
Lines: 1-86

Signature

class DOCXLoader(BaseLoader):
    def load(self, source_content: SourceContent, **kwargs) -> LoaderResult: ...

Import

from crewai_tools.rag.loaders.docx_loader import DOCXLoader

I/O Contract

Inputs

Name	Type	Required	Description
source_content	SourceContent	Yes	Wraps a DOCX file path or URL
**kwargs	Any	No	Additional keyword arguments; headers can be provided for URL downloads

Outputs

Name	Type	Description
return	LoaderResult	Contains extracted paragraph text, source reference, and metadata (format, paragraphs count, tables count)

Usage Examples

Basic Usage

from crewai_tools.rag.loaders.docx_loader import DOCXLoader
from crewai_tools.rag.source_content import SourceContent

loader = DOCXLoader()

# Load from a local file
source = SourceContent("/path/to/report.docx")
result = loader.load(source)
print(result.content)       # Extracted text from all paragraphs
print(result.metadata)
# {'format': 'docx', 'paragraphs': 42, 'tables': 3}

# Load from a URL
source = SourceContent("https://example.com/files/report.docx")
result = loader.load(source)

Related Pages

Principle:CrewAIInc_CrewAI_Knowledge_Ingestion

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment