Metadata
Overview
Concrete search method on KnowledgeStorage for vector similarity search with score filtering provided by the CrewAI knowledge subsystem. This class manages the vector database connection (ChromaDB by default) and provides both write (save) and read (search) operations on embedded document collections.
Source Reference
Signature
class KnowledgeStorage(BaseKnowledgeStorage):
"""Vector storage backend for knowledge chunks."""
def __init__(
self,
embedder: ProviderSpec | BaseEmbeddingsProvider[Any] | type[BaseEmbeddingsProvider[Any]] | None = None,
collection_name: str | None = None,
) -> None: ...
def search(
self,
query: list[str],
limit: int = 5,
metadata_filter: dict | None = None,
score_threshold: float = 0.6,
) -> list[SearchResult]: ...
def save(self, documents: list[str]) -> None: ...
def reset(self) -> None: ...
Import
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
I/O Contract
search()
| Direction |
Type |
Description
|
| Input |
query: list[str] |
List of query strings to search for
|
| Input |
limit: int |
Maximum number of results to return (default: 5)
|
| Input |
None |
Optional metadata-based filtering criteria
|
| Input |
score_threshold: float |
Minimum similarity score for inclusion (default: 0.6)
|
| Output |
list[SearchResult] |
List of search results with id, content, metadata, and score
|
save()
| Direction |
Type |
Description
|
| Input |
documents: list[str] |
List of text chunks to embed and store
|
| Output |
None |
Chunks are embedded and persisted in the vector collection
|
reset()
| Direction |
Type |
Description
|
| Input |
None |
No parameters
|
| Output |
None |
All vectors and metadata in the collection are deleted
|
SearchResult Structure
The SearchResult object returned by search() contains:
| Field |
Type |
Description
|
id |
str |
Unique identifier of the stored chunk
|
content |
str |
The text content of the matched chunk
|
metadata |
dict |
Metadata associated with the chunk (source file, position, etc.)
|
score |
float |
Cosine similarity score between query and chunk (0.0 to 1.0)
|
Method Details
search(query, limit, metadata_filter, score_threshold)
- Each query string in the
query list is embedded using the configured embedding model
- The embedded query vectors are used to perform a nearest-neighbor search in the vector collection
- Results are ranked by cosine similarity score in descending order
- Results below
score_threshold are filtered out
- The top
limit results are returned as SearchResult objects
save(documents)
- Each text document in the list is embedded using the configured embedding model
- The embedded vectors are stored in the named collection in the vector database
- Metadata (e.g., chunk index, source identifier) is stored alongside each vector
reset()
Deletes the entire collection from the vector database, removing all stored vectors and metadata. A new collection with the same name will be created on the next save() call.
Code Examples
Manual Search Call
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
# Create storage with embedder configuration
storage = KnowledgeStorage(
embedder={
"provider": "openai",
"config": {"model": "text-embedding-3-small"},
},
collection_name="product_docs",
)
# Search for relevant chunks
results = storage.search(
query=["How do I configure network settings?"],
limit=5,
score_threshold=0.6,
)
for result in results:
print(f"Score: {result.score:.3f}")
print(f"Content: {result.content[:200]}")
print("---")
Saving Documents Manually
# Save text chunks to the vector store
storage.save(documents=[
"To configure network settings, navigate to Settings > Network...",
"The default network configuration uses DHCP for automatic IP assignment...",
"For manual IP configuration, set the following parameters: IP address...",
])
Automatic Retrieval During Task Execution
During normal crew execution, the storage search is called automatically. The framework:
# This is internal framework code (simplified for illustration)
# Users do NOT write this -- it happens automatically
def _retrieve_knowledge_for_task(task, knowledge):
"""Called internally during task execution."""
query = [task.description]
results = knowledge.storage.search(
query=query,
limit=5,
score_threshold=0.6,
)
# Retrieved chunks are appended to the agent's prompt
context = "\n".join([r.content for r in results])
return context
Using Metadata Filters
# Filter results by metadata (e.g., source file)
results = storage.search(
query=["deployment instructions"],
limit=3,
metadata_filter={"source": "ops_manual.pdf"},
score_threshold=0.5,
)
Related Pages