Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:CrewAIInc CrewAI RAG Directory Loader

From Leeroopedia
Knowledge Sources
Domains RAG, Data_Loading
Last Updated 2026-02-11 00:00 GMT

Overview

Recursively loads and processes all files from a local directory, automatically selecting the appropriate loader for each file type and aggregating content into a single result.

Description

DirectoryLoader extends BaseLoader to traverse directory structures and process files in bulk. It supports several filtering options: recursive traversal (default True), include_extensions and exclude_extensions lists for file type filtering, and max_files to cap the number of processed files. Hidden files and directories (starting with ".") are automatically skipped.

For each file found, the loader uses DataTypes.from_content() to detect the file type and then delegates to the appropriate specialized loader (e.g., PDFLoader for .pdf files, JSONLoader for .json files). Each file's content is prefixed with a header ("=== File: path ===") and errors during individual file processing are captured without halting the overall operation.

The returned LoaderResult contains the combined content from all files and comprehensive metadata including total_files found, processed_files count, errors count, individual file_details (path, metadata, source), and error_details for any failures.

URL directory loading is explicitly not supported; only local filesystem paths are accepted.

Usage

Import DirectoryLoader when you need to ingest an entire directory of documents into the RAG knowledge base. It is typically instantiated automatically by the DataType.DIRECTORY registry.

Code Reference

Source Location

  • Repository: CrewAI
  • File: lib/crewai-tools/src/crewai_tools/rag/loaders/directory_loader.py
  • Lines: 1-166

Signature

class DirectoryLoader(BaseLoader):
    def load(self, source_content: SourceContent, **kwargs) -> LoaderResult: ...

Import

from crewai_tools.rag.loaders.directory_loader import DirectoryLoader

I/O Contract

Inputs

Name Type Required Description
source_content SourceContent Yes Wraps a local directory path
recursive bool No Whether to search recursively (default True, passed via kwargs)
include_extensions list[str] No Only include files with these extensions (passed via kwargs)
exclude_extensions list[str] No Exclude files with these extensions (passed via kwargs)
max_files int No Maximum number of files to process (passed via kwargs)

Outputs

Name Type Description
return LoaderResult Combined content from all files with metadata including total_files, processed_files, errors, file_details, and error_details

Usage Examples

Basic Usage

from crewai_tools.rag.loaders.directory_loader import DirectoryLoader
from crewai_tools.rag.source_content import SourceContent

loader = DirectoryLoader()

# Load all files from a directory recursively
source = SourceContent("/path/to/documents/")
result = loader.load(source)
print(f"Processed {result.metadata['processed_files']} of {result.metadata['total_files']} files")

# Load with filtering
result = loader.load(
    source,
    include_extensions=[".pdf", ".txt", ".md"],
    max_files=50,
    recursive=True,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment