Implementation:CrewAIInc CrewAI Knowledge Storage Search

Metadata

Field	Value
Implementation Name	Knowledge Storage Search
Workflow	Knowledge_RAG_Pipeline
Category	Information Retrieval
Repository	crewAIInc/crewAI
Implements	Principle:CrewAIInc_CrewAI_Semantic_Retrieval

Overview

Concrete search method on KnowledgeStorage for vector similarity search with score filtering provided by the CrewAI knowledge subsystem. This class manages the vector database connection (ChromaDB by default) and provides both write (save) and read (search) operations on embedded document collections.

Source Reference

File	Lines
src/crewai/knowledge/storage/knowledge_storage.py	L55-80

Signature

class KnowledgeStorage(BaseKnowledgeStorage):
    """Vector storage backend for knowledge chunks."""

    def __init__(
        self,
        embedder: ProviderSpec | BaseEmbeddingsProvider[Any] | type[BaseEmbeddingsProvider[Any]] | None = None,
        collection_name: str | None = None,
    ) -> None: ...

    def search(
        self,
        query: list[str],
        limit: int = 5,
        metadata_filter: dict | None = None,
        score_threshold: float = 0.6,
    ) -> list[SearchResult]: ...

    def save(self, documents: list[str]) -> None: ...

    def reset(self) -> None: ...

Import

from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage

I/O Contract

search()

Direction	Type	Description
Input	`query: list[str]`	List of query strings to search for
Input	`limit: int`	Maximum number of results to return (default: 5)
Input	None	Optional metadata-based filtering criteria
Input	`score_threshold: float`	Minimum similarity score for inclusion (default: 0.6)
Output	`list[SearchResult]`	List of search results with id, content, metadata, and score

save()

Direction	Type	Description
Input	`documents: list[str]`	List of text chunks to embed and store
Output	None	Chunks are embedded and persisted in the vector collection

reset()

Direction	Type	Description
Input	None	No parameters
Output	None	All vectors and metadata in the collection are deleted

SearchResult Structure

The SearchResult object returned by search() contains:

Field	Type	Description
`id`	`str`	Unique identifier of the stored chunk
`content`	`str`	The text content of the matched chunk
`metadata`	`dict`	Metadata associated with the chunk (source file, position, etc.)
`score`	`float`	Cosine similarity score between query and chunk (0.0 to 1.0)

Method Details

search(query, limit, metadata_filter, score_threshold)

Each query string in the query list is embedded using the configured embedding model
The embedded query vectors are used to perform a nearest-neighbor search in the vector collection
Results are ranked by cosine similarity score in descending order
Results below score_threshold are filtered out
The top limit results are returned as SearchResult objects

save(documents)

Each text document in the list is embedded using the configured embedding model
The embedded vectors are stored in the named collection in the vector database
Metadata (e.g., chunk index, source identifier) is stored alongside each vector

reset()

Deletes the entire collection from the vector database, removing all stored vectors and metadata. A new collection with the same name will be created on the next save() call.

Code Examples

Manual Search Call

from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage

# Create storage with embedder configuration
storage = KnowledgeStorage(
    embedder={
        "provider": "openai",
        "config": {"model": "text-embedding-3-small"},
    },
    collection_name="product_docs",
)

# Search for relevant chunks
results = storage.search(
    query=["How do I configure network settings?"],
    limit=5,
    score_threshold=0.6,
)

for result in results:
    print(f"Score: {result.score:.3f}")
    print(f"Content: {result.content[:200]}")
    print("---")

Saving Documents Manually

# Save text chunks to the vector store
storage.save(documents=[
    "To configure network settings, navigate to Settings > Network...",
    "The default network configuration uses DHCP for automatic IP assignment...",
    "For manual IP configuration, set the following parameters: IP address...",
])

Automatic Retrieval During Task Execution

During normal crew execution, the storage search is called automatically. The framework:

# This is internal framework code (simplified for illustration)
# Users do NOT write this -- it happens automatically

def _retrieve_knowledge_for_task(task, knowledge):
    """Called internally during task execution."""
    query = [task.description]
    results = knowledge.storage.search(
        query=query,
        limit=5,
        score_threshold=0.6,
    )
    # Retrieved chunks are appended to the agent's prompt
    context = "\n".join([r.content for r in results])
    return context

Using Metadata Filters

# Filter results by metadata (e.g., source file)
results = storage.search(
    query=["deployment instructions"],
    limit=3,
    metadata_filter={"source": "ops_manual.pdf"},
    score_threshold=0.5,
)

Related Pages

Principle:CrewAIInc_CrewAI_Semantic_Retrieval -- The principle this implements
Implementation:CrewAIInc_CrewAI_Knowledge_Constructor -- Knowledge class that delegates to this storage
Implementation:CrewAIInc_CrewAI_Embedder_Config -- Embedder configuration used by this storage
Implementation:CrewAIInc_CrewAI_Crew_Query_Knowledge -- Crew-level query method that calls this storage
Heuristic:CrewAIInc_CrewAI_RAG_Search_Defaults

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment