Implementation:CrewAIInc CrewAI RAG Directory Loader
| Knowledge Sources | |
|---|---|
| Domains | RAG, Data_Loading |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
Recursively loads and processes all files from a local directory, automatically selecting the appropriate loader for each file type and aggregating content into a single result.
Description
DirectoryLoader extends BaseLoader to traverse directory structures and process files in bulk. It supports several filtering options: recursive traversal (default True), include_extensions and exclude_extensions lists for file type filtering, and max_files to cap the number of processed files. Hidden files and directories (starting with ".") are automatically skipped.
For each file found, the loader uses DataTypes.from_content() to detect the file type and then delegates to the appropriate specialized loader (e.g., PDFLoader for .pdf files, JSONLoader for .json files). Each file's content is prefixed with a header ("=== File: path ===") and errors during individual file processing are captured without halting the overall operation.
The returned LoaderResult contains the combined content from all files and comprehensive metadata including total_files found, processed_files count, errors count, individual file_details (path, metadata, source), and error_details for any failures.
URL directory loading is explicitly not supported; only local filesystem paths are accepted.
Usage
Import DirectoryLoader when you need to ingest an entire directory of documents into the RAG knowledge base. It is typically instantiated automatically by the DataType.DIRECTORY registry.
Code Reference
Source Location
- Repository: CrewAI
- File: lib/crewai-tools/src/crewai_tools/rag/loaders/directory_loader.py
- Lines: 1-166
Signature
class DirectoryLoader(BaseLoader):
def load(self, source_content: SourceContent, **kwargs) -> LoaderResult: ...
Import
from crewai_tools.rag.loaders.directory_loader import DirectoryLoader
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| source_content | SourceContent | Yes | Wraps a local directory path |
| recursive | bool | No | Whether to search recursively (default True, passed via kwargs) |
| include_extensions | list[str] | No | Only include files with these extensions (passed via kwargs) |
| exclude_extensions | list[str] | No | Exclude files with these extensions (passed via kwargs) |
| max_files | int | No | Maximum number of files to process (passed via kwargs) |
Outputs
| Name | Type | Description |
|---|---|---|
| return | LoaderResult | Combined content from all files with metadata including total_files, processed_files, errors, file_details, and error_details |
Usage Examples
Basic Usage
from crewai_tools.rag.loaders.directory_loader import DirectoryLoader
from crewai_tools.rag.source_content import SourceContent
loader = DirectoryLoader()
# Load all files from a directory recursively
source = SourceContent("/path/to/documents/")
result = loader.load(source)
print(f"Processed {result.metadata['processed_files']} of {result.metadata['total_files']} files")
# Load with filtering
result = loader.load(
source,
include_extensions=[".pdf", ".txt", ".md"],
max_files=50,
recursive=True,
)