Implementation:CrewAIInc CrewAI RAG XML Loader
| Knowledge Sources | |
|---|---|
| Domains | RAG, Data_Loading |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
Loads and parses XML files from local paths or URLs, extracting all text content while preserving root element metadata.
Description
XMLLoader extends BaseLoader to handle XML data from multiple sources. It reads content from local files (with UTF-8 encoding) or fetches from URLs using the shared load_from_url utility with XML-compatible accept headers ("application/xml, text/xml, text/plain").
The _parse_xml() method uses Python's xml.etree.ElementTree module. If the content starts with "<", it parses using fromstring() (for in-memory XML strings). Otherwise, it falls back to parse() for file-based parsing. All text nodes are extracted via itertext(), filtered to remove empty/whitespace-only entries, and joined with newline separators.
The returned LoaderResult includes metadata about the format ("xml") and the root_tag of the XML document. Parse errors from ParseError are handled gracefully by returning the raw content with error details in the metadata.
Usage
Import XMLLoader when you need to explicitly load XML data. It is typically instantiated automatically by the DataType.XML registry when .xml files or URLs with XML content are detected.
Code Reference
Source Location
- Repository: CrewAI
- File: lib/crewai-tools/src/crewai_tools/rag/loaders/xml_loader.py
- Lines: 1-63
Signature
class XMLLoader(BaseLoader):
def load(self, source_content: SourceContent, **kwargs: Any) -> LoaderResult: ...
Import
from crewai_tools.rag.loaders.xml_loader import XMLLoader
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| source_content | SourceContent | Yes | Wraps an XML file path, URL, or raw XML string |
| **kwargs | Any | No | Additional keyword arguments passed to URL loading |
Outputs
| Name | Type | Description |
|---|---|---|
| return | LoaderResult | Contains extracted text content from all XML nodes; metadata includes format ("xml") and root_tag |
Usage Examples
Basic Usage
from crewai_tools.rag.loaders.xml_loader import XMLLoader
from crewai_tools.rag.source_content import SourceContent
loader = XMLLoader()
# Load from a local file
source = SourceContent("/path/to/data.xml")
result = loader.load(source)
print(result.content)
# Extracted text from all XML elements, one per line
print(result.metadata)
# {'format': 'xml', 'root_tag': 'catalog'}
# Load from a URL
source = SourceContent("https://example.com/feed.xml")
result = loader.load(source)