Implementation:CrewAIInc CrewAI Knowledge Constructor
Metadata
| Field | Value |
|---|---|
| Implementation Name | Knowledge Constructor |
| Workflow | Knowledge_RAG_Pipeline |
| Category | Vector Storage |
| Repository | crewAIInc/crewAI |
| Implements | Principle:CrewAIInc_CrewAI_Knowledge_Ingestion |
Overview
Concrete class for orchestrating knowledge source ingestion, embedding, and vector storage provided by the CrewAI knowledge subsystem. The Knowledge class serves as the central coordinator that connects sources, embedders, and storage into a unified ingestion and query pipeline.
Source Reference
| File | Lines |
|---|---|
| src/crewai/knowledge/knowledge.py | L14-118 |
Signature
class Knowledge(BaseModel):
"""Manages knowledge sources and provides query interface to stored knowledge."""
sources: list[BaseKnowledgeSource]
storage: KnowledgeStorage | None = None
embedder: EmbedderConfig | None = None
collection_name: str | None = None
def __init__(
self,
collection_name: str,
sources: list[BaseKnowledgeSource],
embedder: EmbedderConfig | None = None,
storage: KnowledgeStorage | None = None,
) -> None: ...
def add_sources(self) -> None:
"""Ingest all configured sources into the vector store."""
...
def query(
self,
query: list[str],
results_limit: int = 5,
score_threshold: float = 0.6,
) -> list[SearchResult]:
"""Search the knowledge base for relevant chunks."""
...
def reset(self) -> None:
"""Clear all vectors in the collection."""
...
Import
from crewai.knowledge.knowledge import Knowledge
I/O Contract
| Direction | Type | Description |
|---|---|---|
| Input | sources: list[BaseKnowledgeSource] |
List of configured knowledge source instances |
| Input | None | Embedding provider configuration (optional, defaults to OpenAI) |
| Input | collection_name: str |
Name for the vector store collection |
| Input | None | Optional pre-configured storage backend |
| Output | Knowledge instance |
Fully initialized knowledge object with ingested vector store |
Constructor Behavior
When a Knowledge object is constructed:
- The
collection_nameis used to create or connect to a named vector collection - If no
storageis provided, a defaultKnowledgeStorageis created using theembedderandcollection_name - The
storageis assigned to each source viasource.storage = self.storage - Sources are ready for ingestion via
add_sources()
Method Details
add_sources()
Iterates over all configured sources and calls source.add() on each one. This triggers the full ingestion pipeline for each source:
source.load_content()-- parse and extract textsource._chunk_text()-- segment into chunkssource.storage.save()-- embed and store chunks
query(query, results_limit, score_threshold)
Searches the vector store for chunks semantically similar to the query:
- query -- List of query strings to search for
- results_limit -- Maximum number of results to return (default: 5)
- score_threshold -- Minimum similarity score for inclusion (default: 0.6)
- Returns a list of
SearchResultobjects containing matched chunks, metadata, and similarity scores
reset()
Clears all vectors and metadata from the collection. This is useful for re-ingesting sources after document updates.
Code Examples
Basic Knowledge Creation and Ingestion
from crewai.knowledge.knowledge import Knowledge
from crewai.knowledge.source import PDFKnowledgeSource
# Configure source
pdf_source = PDFKnowledgeSource(
file_paths=["docs/product_manual.pdf"],
chunk_size=4000,
chunk_overlap=200,
)
# Create Knowledge object and ingest
knowledge = Knowledge(
collection_name="product_docs",
sources=[pdf_source],
embedder={
"provider": "openai",
"config": {"model": "text-embedding-3-small"},
},
)
knowledge.add_sources()
Querying Ingested Knowledge
# After ingestion, query the knowledge base
results = knowledge.query(
query=["How do I configure the network settings?"],
results_limit=5,
score_threshold=0.6,
)
for result in results:
print(f"Score: {result.score}")
print(f"Content: {result.content[:200]}...")
Re-ingesting After Document Updates
# Clear existing vectors
knowledge.reset()
# Update sources with new documents
knowledge.sources = [
PDFKnowledgeSource(file_paths=["docs/product_manual_v2.pdf"]),
]
# Re-ingest
knowledge.add_sources()
Multiple Sources with Custom Storage
from crewai.knowledge.knowledge import Knowledge
from crewai.knowledge.source import PDFKnowledgeSource, CSVKnowledgeSource
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
# Pre-configure storage
storage = KnowledgeStorage(
embedder={"provider": "ollama", "config": {"model": "nomic-embed-text"}},
collection_name="support_kb",
)
knowledge = Knowledge(
collection_name="support_kb",
sources=[
PDFKnowledgeSource(file_paths=["docs/manual.pdf"]),
CSVKnowledgeSource(file_paths=["data/faq.csv"]),
],
storage=storage,
)
knowledge.add_sources()
Related Pages
- Principle:CrewAIInc_CrewAI_Knowledge_Ingestion -- The principle this implements
- Implementation:CrewAIInc_CrewAI_Knowledge_Source_Classes -- Source classes consumed by Knowledge
- Implementation:CrewAIInc_CrewAI_Embedder_Config -- Embedder configuration type
- Implementation:CrewAIInc_CrewAI_Knowledge_Storage_Search -- Storage backend used for persistence and search
- Implementation:CrewAIInc_CrewAI_Knowledge_Attachment_Config -- How Knowledge is auto-created from Crew/Agent config
- Environment:CrewAIInc_CrewAI_Python_Runtime_Environment