Implementation:CrewAIInc CrewAI RAG XML Loader

Knowledge Sources	CrewAI
Domains	RAG, Data_Loading
Last Updated	2026-02-11 00:00 GMT

Overview

Loads and parses XML files from local paths or URLs, extracting all text content while preserving root element metadata.

Description

XMLLoader extends BaseLoader to handle XML data from multiple sources. It reads content from local files (with UTF-8 encoding) or fetches from URLs using the shared load_from_url utility with XML-compatible accept headers ("application/xml, text/xml, text/plain").

The _parse_xml() method uses Python's xml.etree.ElementTree module. If the content starts with "<", it parses using fromstring() (for in-memory XML strings). Otherwise, it falls back to parse() for file-based parsing. All text nodes are extracted via itertext(), filtered to remove empty/whitespace-only entries, and joined with newline separators.

The returned LoaderResult includes metadata about the format ("xml") and the root_tag of the XML document. Parse errors from ParseError are handled gracefully by returning the raw content with error details in the metadata.

Usage

Import XMLLoader when you need to explicitly load XML data. It is typically instantiated automatically by the DataType.XML registry when .xml files or URLs with XML content are detected.

Code Reference

Source Location

Repository: CrewAI
File: lib/crewai-tools/src/crewai_tools/rag/loaders/xml_loader.py
Lines: 1-63

Signature

class XMLLoader(BaseLoader):
    def load(self, source_content: SourceContent, **kwargs: Any) -> LoaderResult: ...

Import

from crewai_tools.rag.loaders.xml_loader import XMLLoader

I/O Contract

Inputs

Name	Type	Required	Description
source_content	SourceContent	Yes	Wraps an XML file path, URL, or raw XML string
**kwargs	Any	No	Additional keyword arguments passed to URL loading

Outputs

Name	Type	Description
return	LoaderResult	Contains extracted text content from all XML nodes; metadata includes format ("xml") and root_tag

Usage Examples

Basic Usage

from crewai_tools.rag.loaders.xml_loader import XMLLoader
from crewai_tools.rag.source_content import SourceContent

loader = XMLLoader()

# Load from a local file
source = SourceContent("/path/to/data.xml")
result = loader.load(source)
print(result.content)
# Extracted text from all XML elements, one per line

print(result.metadata)
# {'format': 'xml', 'root_tag': 'catalog'}

# Load from a URL
source = SourceContent("https://example.com/feed.xml")
result = loader.load(source)

Related Pages

Principle:CrewAIInc_CrewAI_Knowledge_Ingestion

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment