Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:CrewAIInc CrewAI Knowledge Source Classes

From Leeroopedia

Metadata

Field Value
Implementation Name Knowledge Source Classes
Workflow Knowledge_RAG_Pipeline
Category Data Ingestion
Repository crewAIInc/crewAI
Implements Principle:CrewAIInc_CrewAI_Knowledge_Source_Selection

Overview

Concrete file-format-specific knowledge source classes for loading and chunking documents provided by the CrewAI knowledge subsystem. These classes inherit from BaseFileKnowledgeSource and implement format-specific parsing while sharing a common chunking and ingestion interface.

Source References

Class Source File Lines
PDFKnowledgeSource src/crewai/knowledge/source/pdf_knowledge_source.py L7-60
TextFileKnowledgeSource src/crewai/knowledge/source/text_file_knowledge_source.py L6-40
CSVKnowledgeSource src/crewai/knowledge/source/csv_knowledge_source.py L7-48

Signatures

class PDFKnowledgeSource(BaseFileKnowledgeSource):
    """Knowledge source for PDF documents."""
    def load_content(self) -> dict[Path, str]: ...
    def add(self) -> None: ...
    def _chunk_text(self, text: str) -> list[str]: ...

class TextFileKnowledgeSource(BaseFileKnowledgeSource):
    """Knowledge source for plain text files."""
    def load_content(self) -> dict[Path, str]: ...
    def add(self) -> None: ...
    def _chunk_text(self, text: str) -> list[str]: ...

class CSVKnowledgeSource(BaseFileKnowledgeSource):
    """Knowledge source for CSV files."""
    def load_content(self) -> dict[Path, str]: ...
    def add(self) -> None: ...
    def _chunk_text(self, text: str) -> list[str]: ...

BaseFileKnowledgeSource fields:

class BaseFileKnowledgeSource(BaseKnowledgeSource):
    file_paths: list[Path | str]
    chunk_size: int = 4000
    chunk_overlap: int = 200
    content: dict[Path, str] = {}

Import

from crewai.knowledge.source import PDFKnowledgeSource, TextFileKnowledgeSource, CSVKnowledgeSource

I/O Contract

Direction Type Description
Input str] List of file paths to load and parse
Input chunk_size: int Maximum characters per chunk (default: 4000)
Input chunk_overlap: int Overlapping characters between chunks (default: 200)
Output Knowledge source instance Object with loaded and chunked content, ready for embedding and storage

Method Details

load_content()

Reads files from disk and extracts text content. Returns a dictionary mapping file paths to their extracted text. Each source type uses a format-specific parser:

  • PDFKnowledgeSource -- Uses a PDF parsing library to extract text from each page
  • TextFileKnowledgeSource -- Reads file contents directly as UTF-8 text
  • CSVKnowledgeSource -- Reads CSV rows and converts them to a text representation

add()

Orchestrates the full ingestion pipeline for the source:

  1. Calls load_content() to extract text from files
  2. Calls _chunk_text() on the extracted text to produce chunks
  3. Saves chunks to the knowledge storage via the storage backend

_chunk_text(text)

Splits a text string into overlapping chunks:

  1. Divides text into segments of chunk_size characters
  2. Each chunk overlaps with the previous by chunk_overlap characters
  3. Returns a list of text chunk strings

Code Examples

Creating a PDF Knowledge Source

from crewai.knowledge.source import PDFKnowledgeSource

# Create a PDF source with custom chunking parameters
pdf_source = PDFKnowledgeSource(
    file_paths=["docs/product_manual.pdf", "docs/api_reference.pdf"],
    chunk_size=4000,
    chunk_overlap=200,
)

# Content is loaded lazily when add() is called
# pdf_source.add() triggers: load_content() -> _chunk_text() -> storage.save()

Creating a Text File Knowledge Source

from crewai.knowledge.source import TextFileKnowledgeSource

text_source = TextFileKnowledgeSource(
    file_paths=["notes/meeting_notes.txt", "notes/design_doc.md"],
    chunk_size=3000,
    chunk_overlap=150,
)

Creating a CSV Knowledge Source

from crewai.knowledge.source import CSVKnowledgeSource

csv_source = CSVKnowledgeSource(
    file_paths=["data/customers.csv"],
    chunk_size=2000,
    chunk_overlap=100,
)

Combining Multiple Sources

from crewai.knowledge.source import (
    PDFKnowledgeSource,
    TextFileKnowledgeSource,
    CSVKnowledgeSource,
)

sources = [
    PDFKnowledgeSource(file_paths=["manuals/guide.pdf"]),
    TextFileKnowledgeSource(file_paths=["docs/readme.txt"]),
    CSVKnowledgeSource(file_paths=["data/faq.csv"]),
]

# All sources share the same interface and can be passed to Knowledge()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment