Implementation:CrewAIInc CrewAI RAG Data Types

Knowledge Sources	CrewAI
Domains	RAG, Data_Loading
Last Updated	2026-02-11 00:00 GMT

Overview

Defines the data type enumeration and provides automatic mapping between content types, their appropriate loaders, and chunkers in the CrewAI RAG system.

Description

This module contains two main components that form a critical coordination layer in the RAG pipeline.

DataType is a string enum with 17 members representing all supported content types: FILE, PDF_FILE, TEXT_FILE, CSV, JSON, XML, DOCX, MDX, MYSQL, POSTGRES, GITHUB, DIRECTORY, WEBSITE, DOCS_SITE, YOUTUBE_VIDEO, YOUTUBE_CHANNEL, and TEXT. Each enum member provides two methods:

get_chunker() returns the appropriate chunker instance by dynamically importing from the chunkers package. Text-based types use TextChunker, structured formats (CSV, JSON, XML) use specialized chunkers, and web content uses WebsiteChunker.
get_loader() returns the appropriate loader instance by dynamically importing from the loaders package. Each data type maps to a specific loader class (e.g., PDF_FILE maps to PDFLoader, GITHUB maps to GithubLoader).

DataTypes is a utility class with a static from_content() method that performs automatic type detection. It examines file extensions (mapping .pdf, .csv, .json, .xml, .docx, .mdx, .md, .txt to their respective types), URL patterns (detecting GitHub, docs sites, and general websites), filesystem paths (distinguishing files from directories), and falls back to plain TEXT for unrecognized content.

Usage

Import DataType when you need to explicitly specify a content type or access its loader/chunker. Import DataTypes when you need automatic content type detection from file paths, URLs, or strings.

Code Reference

Source Location

Repository: CrewAI
File: lib/crewai-tools/src/crewai_tools/rag/data_types.py
Lines: 1-154

Signature

class DataType(str, Enum):
    FILE = "file"
    PDF_FILE = "pdf_file"
    TEXT_FILE = "text_file"
    CSV = "csv"
    JSON = "json"
    XML = "xml"
    DOCX = "docx"
    MDX = "mdx"
    MYSQL = "mysql"
    POSTGRES = "postgres"
    GITHUB = "github"
    DIRECTORY = "directory"
    WEBSITE = "website"
    DOCS_SITE = "docs_site"
    YOUTUBE_VIDEO = "youtube_video"
    YOUTUBE_CHANNEL = "youtube_channel"
    TEXT = "text"

    def get_chunker(self) -> BaseChunker: ...
    def get_loader(self) -> BaseLoader: ...

class DataTypes:
    @staticmethod
    def from_content(content: str | Path | None = None) -> DataType: ...

Import

from crewai_tools.rag.data_types import DataType, DataTypes

I/O Contract

Inputs (DataType.get_chunker)

Name	Type	Required	Description
self	DataType	Yes	The data type enum member

Inputs (DataType.get_loader)

Name	Type	Required	Description
self	DataType	Yes	The data type enum member

Inputs (DataTypes.from_content)

Name	Type	Required	Description
content	Path \| None	No	File path, URL, or content string to detect type for (default None returns TEXT)

Outputs

Name	Type	Description
get_chunker() return	BaseChunker	Appropriate chunker instance for the data type
get_loader() return	BaseLoader	Appropriate loader instance for the data type
from_content() return	DataType	Detected data type enum member

Usage Examples

Basic Usage

from crewai_tools.rag.data_types import DataType, DataTypes

# Automatic type detection
dtype = DataTypes.from_content("/path/to/document.pdf")
# Returns DataType.PDF_FILE

dtype = DataTypes.from_content("https://github.com/crewAIInc/crewAI")
# Returns DataType.GITHUB

dtype = DataTypes.from_content("https://docs.example.com/guide")
# Returns DataType.DOCS_SITE

# Get loader and chunker for a type
loader = DataType.CSV.get_loader()     # Returns CSVLoader()
chunker = DataType.CSV.get_chunker()   # Returns CsvChunker()

# Detect and process
dtype = DataTypes.from_content("data.json")
loader = dtype.get_loader()
chunker = dtype.get_chunker()

Related Pages

Principle:CrewAIInc_CrewAI_Knowledge_Ingestion

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment