Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:CrewAIInc CrewAI RAG XML Loader

From Leeroopedia
Knowledge Sources
Domains RAG, Data_Loading
Last Updated 2026-02-11 00:00 GMT

Overview

Loads and parses XML files from local paths or URLs, extracting all text content while preserving root element metadata.

Description

XMLLoader extends BaseLoader to handle XML data from multiple sources. It reads content from local files (with UTF-8 encoding) or fetches from URLs using the shared load_from_url utility with XML-compatible accept headers ("application/xml, text/xml, text/plain").

The _parse_xml() method uses Python's xml.etree.ElementTree module. If the content starts with "<", it parses using fromstring() (for in-memory XML strings). Otherwise, it falls back to parse() for file-based parsing. All text nodes are extracted via itertext(), filtered to remove empty/whitespace-only entries, and joined with newline separators.

The returned LoaderResult includes metadata about the format ("xml") and the root_tag of the XML document. Parse errors from ParseError are handled gracefully by returning the raw content with error details in the metadata.

Usage

Import XMLLoader when you need to explicitly load XML data. It is typically instantiated automatically by the DataType.XML registry when .xml files or URLs with XML content are detected.

Code Reference

Source Location

  • Repository: CrewAI
  • File: lib/crewai-tools/src/crewai_tools/rag/loaders/xml_loader.py
  • Lines: 1-63

Signature

class XMLLoader(BaseLoader):
    def load(self, source_content: SourceContent, **kwargs: Any) -> LoaderResult: ...

Import

from crewai_tools.rag.loaders.xml_loader import XMLLoader

I/O Contract

Inputs

Name Type Required Description
source_content SourceContent Yes Wraps an XML file path, URL, or raw XML string
**kwargs Any No Additional keyword arguments passed to URL loading

Outputs

Name Type Description
return LoaderResult Contains extracted text content from all XML nodes; metadata includes format ("xml") and root_tag

Usage Examples

Basic Usage

from crewai_tools.rag.loaders.xml_loader import XMLLoader
from crewai_tools.rag.source_content import SourceContent

loader = XMLLoader()

# Load from a local file
source = SourceContent("/path/to/data.xml")
result = loader.load(source)
print(result.content)
# Extracted text from all XML elements, one per line

print(result.metadata)
# {'format': 'xml', 'root_tag': 'catalog'}

# Load from a URL
source = SourceContent("https://example.com/feed.xml")
result = loader.load(source)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment