Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:CrewAIInc CrewAI Knowledge Storage Search

From Leeroopedia

Metadata

Field Value
Implementation Name Knowledge Storage Search
Workflow Knowledge_RAG_Pipeline
Category Information Retrieval
Repository crewAIInc/crewAI
Implements Principle:CrewAIInc_CrewAI_Semantic_Retrieval

Overview

Concrete search method on KnowledgeStorage for vector similarity search with score filtering provided by the CrewAI knowledge subsystem. This class manages the vector database connection (ChromaDB by default) and provides both write (save) and read (search) operations on embedded document collections.

Source Reference

File Lines
src/crewai/knowledge/storage/knowledge_storage.py L55-80

Signature

class KnowledgeStorage(BaseKnowledgeStorage):
    """Vector storage backend for knowledge chunks."""

    def __init__(
        self,
        embedder: ProviderSpec | BaseEmbeddingsProvider[Any] | type[BaseEmbeddingsProvider[Any]] | None = None,
        collection_name: str | None = None,
    ) -> None: ...

    def search(
        self,
        query: list[str],
        limit: int = 5,
        metadata_filter: dict | None = None,
        score_threshold: float = 0.6,
    ) -> list[SearchResult]: ...

    def save(self, documents: list[str]) -> None: ...

    def reset(self) -> None: ...

Import

from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage

I/O Contract

search()

Direction Type Description
Input query: list[str] List of query strings to search for
Input limit: int Maximum number of results to return (default: 5)
Input None Optional metadata-based filtering criteria
Input score_threshold: float Minimum similarity score for inclusion (default: 0.6)
Output list[SearchResult] List of search results with id, content, metadata, and score

save()

Direction Type Description
Input documents: list[str] List of text chunks to embed and store
Output None Chunks are embedded and persisted in the vector collection

reset()

Direction Type Description
Input None No parameters
Output None All vectors and metadata in the collection are deleted

SearchResult Structure

The SearchResult object returned by search() contains:

Field Type Description
id str Unique identifier of the stored chunk
content str The text content of the matched chunk
metadata dict Metadata associated with the chunk (source file, position, etc.)
score float Cosine similarity score between query and chunk (0.0 to 1.0)

Method Details

search(query, limit, metadata_filter, score_threshold)

  1. Each query string in the query list is embedded using the configured embedding model
  2. The embedded query vectors are used to perform a nearest-neighbor search in the vector collection
  3. Results are ranked by cosine similarity score in descending order
  4. Results below score_threshold are filtered out
  5. The top limit results are returned as SearchResult objects

save(documents)

  1. Each text document in the list is embedded using the configured embedding model
  2. The embedded vectors are stored in the named collection in the vector database
  3. Metadata (e.g., chunk index, source identifier) is stored alongside each vector

reset()

Deletes the entire collection from the vector database, removing all stored vectors and metadata. A new collection with the same name will be created on the next save() call.

Code Examples

Manual Search Call

from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage

# Create storage with embedder configuration
storage = KnowledgeStorage(
    embedder={
        "provider": "openai",
        "config": {"model": "text-embedding-3-small"},
    },
    collection_name="product_docs",
)

# Search for relevant chunks
results = storage.search(
    query=["How do I configure network settings?"],
    limit=5,
    score_threshold=0.6,
)

for result in results:
    print(f"Score: {result.score:.3f}")
    print(f"Content: {result.content[:200]}")
    print("---")

Saving Documents Manually

# Save text chunks to the vector store
storage.save(documents=[
    "To configure network settings, navigate to Settings > Network...",
    "The default network configuration uses DHCP for automatic IP assignment...",
    "For manual IP configuration, set the following parameters: IP address...",
])

Automatic Retrieval During Task Execution

During normal crew execution, the storage search is called automatically. The framework:

# This is internal framework code (simplified for illustration)
# Users do NOT write this -- it happens automatically

def _retrieve_knowledge_for_task(task, knowledge):
    """Called internally during task execution."""
    query = [task.description]
    results = knowledge.storage.search(
        query=query,
        limit=5,
        score_threshold=0.6,
    )
    # Retrieved chunks are appended to the agent's prompt
    context = "\n".join([r.content for r in results])
    return context

Using Metadata Filters

# Filter results by metadata (e.g., source file)
results = storage.search(
    query=["deployment instructions"],
    limit=3,
    metadata_filter={"source": "ops_manual.pdf"},
    score_threshold=0.5,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment