Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:CrewAIInc CrewAI Knowledge Constructor

From Leeroopedia

Metadata

Field Value
Implementation Name Knowledge Constructor
Workflow Knowledge_RAG_Pipeline
Category Vector Storage
Repository crewAIInc/crewAI
Implements Principle:CrewAIInc_CrewAI_Knowledge_Ingestion

Overview

Concrete class for orchestrating knowledge source ingestion, embedding, and vector storage provided by the CrewAI knowledge subsystem. The Knowledge class serves as the central coordinator that connects sources, embedders, and storage into a unified ingestion and query pipeline.

Source Reference

File Lines
src/crewai/knowledge/knowledge.py L14-118

Signature

class Knowledge(BaseModel):
    """Manages knowledge sources and provides query interface to stored knowledge."""

    sources: list[BaseKnowledgeSource]
    storage: KnowledgeStorage | None = None
    embedder: EmbedderConfig | None = None
    collection_name: str | None = None

    def __init__(
        self,
        collection_name: str,
        sources: list[BaseKnowledgeSource],
        embedder: EmbedderConfig | None = None,
        storage: KnowledgeStorage | None = None,
    ) -> None: ...

    def add_sources(self) -> None:
        """Ingest all configured sources into the vector store."""
        ...

    def query(
        self,
        query: list[str],
        results_limit: int = 5,
        score_threshold: float = 0.6,
    ) -> list[SearchResult]:
        """Search the knowledge base for relevant chunks."""
        ...

    def reset(self) -> None:
        """Clear all vectors in the collection."""
        ...

Import

from crewai.knowledge.knowledge import Knowledge

I/O Contract

Direction Type Description
Input sources: list[BaseKnowledgeSource] List of configured knowledge source instances
Input None Embedding provider configuration (optional, defaults to OpenAI)
Input collection_name: str Name for the vector store collection
Input None Optional pre-configured storage backend
Output Knowledge instance Fully initialized knowledge object with ingested vector store

Constructor Behavior

When a Knowledge object is constructed:

  1. The collection_name is used to create or connect to a named vector collection
  2. If no storage is provided, a default KnowledgeStorage is created using the embedder and collection_name
  3. The storage is assigned to each source via source.storage = self.storage
  4. Sources are ready for ingestion via add_sources()

Method Details

add_sources()

Iterates over all configured sources and calls source.add() on each one. This triggers the full ingestion pipeline for each source:

  1. source.load_content() -- parse and extract text
  2. source._chunk_text() -- segment into chunks
  3. source.storage.save() -- embed and store chunks

query(query, results_limit, score_threshold)

Searches the vector store for chunks semantically similar to the query:

  • query -- List of query strings to search for
  • results_limit -- Maximum number of results to return (default: 5)
  • score_threshold -- Minimum similarity score for inclusion (default: 0.6)
  • Returns a list of SearchResult objects containing matched chunks, metadata, and similarity scores

reset()

Clears all vectors and metadata from the collection. This is useful for re-ingesting sources after document updates.

Code Examples

Basic Knowledge Creation and Ingestion

from crewai.knowledge.knowledge import Knowledge
from crewai.knowledge.source import PDFKnowledgeSource

# Configure source
pdf_source = PDFKnowledgeSource(
    file_paths=["docs/product_manual.pdf"],
    chunk_size=4000,
    chunk_overlap=200,
)

# Create Knowledge object and ingest
knowledge = Knowledge(
    collection_name="product_docs",
    sources=[pdf_source],
    embedder={
        "provider": "openai",
        "config": {"model": "text-embedding-3-small"},
    },
)
knowledge.add_sources()

Querying Ingested Knowledge

# After ingestion, query the knowledge base
results = knowledge.query(
    query=["How do I configure the network settings?"],
    results_limit=5,
    score_threshold=0.6,
)

for result in results:
    print(f"Score: {result.score}")
    print(f"Content: {result.content[:200]}...")

Re-ingesting After Document Updates

# Clear existing vectors
knowledge.reset()

# Update sources with new documents
knowledge.sources = [
    PDFKnowledgeSource(file_paths=["docs/product_manual_v2.pdf"]),
]

# Re-ingest
knowledge.add_sources()

Multiple Sources with Custom Storage

from crewai.knowledge.knowledge import Knowledge
from crewai.knowledge.source import PDFKnowledgeSource, CSVKnowledgeSource
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage

# Pre-configure storage
storage = KnowledgeStorage(
    embedder={"provider": "ollama", "config": {"model": "nomic-embed-text"}},
    collection_name="support_kb",
)

knowledge = Knowledge(
    collection_name="support_kb",
    sources=[
        PDFKnowledgeSource(file_paths=["docs/manual.pdf"]),
        CSVKnowledgeSource(file_paths=["data/faq.csv"]),
    ],
    storage=storage,
)
knowledge.add_sources()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment