Implementation:CrewAIInc CrewAI Knowledge Constructor

Metadata

Field	Value
Implementation Name	Knowledge Constructor
Workflow	Knowledge_RAG_Pipeline
Category	Vector Storage
Repository	crewAIInc/crewAI
Implements	Principle:CrewAIInc_CrewAI_Knowledge_Ingestion

Overview

Concrete class for orchestrating knowledge source ingestion, embedding, and vector storage provided by the CrewAI knowledge subsystem. The Knowledge class serves as the central coordinator that connects sources, embedders, and storage into a unified ingestion and query pipeline.

Source Reference

File	Lines
src/crewai/knowledge/knowledge.py	L14-118

Signature

class Knowledge(BaseModel):
    """Manages knowledge sources and provides query interface to stored knowledge."""

    sources: list[BaseKnowledgeSource]
    storage: KnowledgeStorage | None = None
    embedder: EmbedderConfig | None = None
    collection_name: str | None = None

    def __init__(
        self,
        collection_name: str,
        sources: list[BaseKnowledgeSource],
        embedder: EmbedderConfig | None = None,
        storage: KnowledgeStorage | None = None,
    ) -> None: ...

    def add_sources(self) -> None:
        """Ingest all configured sources into the vector store."""
        ...

    def query(
        self,
        query: list[str],
        results_limit: int = 5,
        score_threshold: float = 0.6,
    ) -> list[SearchResult]:
        """Search the knowledge base for relevant chunks."""
        ...

    def reset(self) -> None:
        """Clear all vectors in the collection."""
        ...

Import

from crewai.knowledge.knowledge import Knowledge

I/O Contract

Direction	Type	Description
Input	`sources: list[BaseKnowledgeSource]`	List of configured knowledge source instances
Input	None	Embedding provider configuration (optional, defaults to OpenAI)
Input	`collection_name: str`	Name for the vector store collection
Input	None	Optional pre-configured storage backend
Output	`Knowledge` instance	Fully initialized knowledge object with ingested vector store

Constructor Behavior

When a Knowledge object is constructed:

The collection_name is used to create or connect to a named vector collection
If no storage is provided, a default KnowledgeStorage is created using the embedder and collection_name
The storage is assigned to each source via source.storage = self.storage
Sources are ready for ingestion via add_sources()

Method Details

add_sources()

Iterates over all configured sources and calls source.add() on each one. This triggers the full ingestion pipeline for each source:

source.load_content() -- parse and extract text
source._chunk_text() -- segment into chunks
source.storage.save() -- embed and store chunks

query(query, results_limit, score_threshold)

Searches the vector store for chunks semantically similar to the query:

query -- List of query strings to search for
results_limit -- Maximum number of results to return (default: 5)
score_threshold -- Minimum similarity score for inclusion (default: 0.6)
Returns a list of SearchResult objects containing matched chunks, metadata, and similarity scores

reset()

Clears all vectors and metadata from the collection. This is useful for re-ingesting sources after document updates.

Code Examples

Basic Knowledge Creation and Ingestion

from crewai.knowledge.knowledge import Knowledge
from crewai.knowledge.source import PDFKnowledgeSource

# Configure source
pdf_source = PDFKnowledgeSource(
    file_paths=["docs/product_manual.pdf"],
    chunk_size=4000,
    chunk_overlap=200,
)

# Create Knowledge object and ingest
knowledge = Knowledge(
    collection_name="product_docs",
    sources=[pdf_source],
    embedder={
        "provider": "openai",
        "config": {"model": "text-embedding-3-small"},
    },
)
knowledge.add_sources()

Querying Ingested Knowledge

# After ingestion, query the knowledge base
results = knowledge.query(
    query=["How do I configure the network settings?"],
    results_limit=5,
    score_threshold=0.6,
)

for result in results:
    print(f"Score: {result.score}")
    print(f"Content: {result.content[:200]}...")

Re-ingesting After Document Updates

# Clear existing vectors
knowledge.reset()

# Update sources with new documents
knowledge.sources = [
    PDFKnowledgeSource(file_paths=["docs/product_manual_v2.pdf"]),
]

# Re-ingest
knowledge.add_sources()

Multiple Sources with Custom Storage

from crewai.knowledge.knowledge import Knowledge
from crewai.knowledge.source import PDFKnowledgeSource, CSVKnowledgeSource
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage

# Pre-configure storage
storage = KnowledgeStorage(
    embedder={"provider": "ollama", "config": {"model": "nomic-embed-text"}},
    collection_name="support_kb",
)

knowledge = Knowledge(
    collection_name="support_kb",
    sources=[
        PDFKnowledgeSource(file_paths=["docs/manual.pdf"]),
        CSVKnowledgeSource(file_paths=["data/faq.csv"]),
    ],
    storage=storage,
)
knowledge.add_sources()

Related Pages

Principle:CrewAIInc_CrewAI_Knowledge_Ingestion -- The principle this implements
Implementation:CrewAIInc_CrewAI_Knowledge_Source_Classes -- Source classes consumed by Knowledge
Implementation:CrewAIInc_CrewAI_Embedder_Config -- Embedder configuration type
Implementation:CrewAIInc_CrewAI_Knowledge_Storage_Search -- Storage backend used for persistence and search
Implementation:CrewAIInc_CrewAI_Knowledge_Attachment_Config -- How Knowledge is auto-created from Crew/Agent config
Environment:CrewAIInc_CrewAI_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment