Principle:CrewAIInc CrewAI Semantic Retrieval
Metadata
| Field | Value |
|---|---|
| Principle Name | Semantic Retrieval |
| Workflow | Knowledge_RAG_Pipeline |
| Category | Information Retrieval |
| Repository | crewAIInc/crewAI |
| Implemented By | Implementation:CrewAIInc_CrewAI_Knowledge_Storage_Search |
Overview
An automatic retrieval mechanism that searches the vector store for relevant knowledge chunks during task execution, augmenting the agent's prompt with contextual information. Semantic Retrieval is the core "R" in RAG (Retrieval-Augmented Generation), enabling agents to ground their responses in specific domain knowledge.
Description
Semantic Retrieval is the "R" in RAG. During task execution, the agent's task description is used to generate a search query against the knowledge vector store. The most relevant chunks (filtered by score threshold and result limit) are appended to the agent's prompt as additional context. This happens automatically when knowledge is attached to the crew or agent; no explicit code is needed from the user.
The retrieval process works as follows:
- The agent receives a task to execute
- The task description (and optionally the expected output) is used as the search query
- The query is embedded using the same embedding model that was used during ingestion
- The embedded query is compared against all stored document vectors using cosine similarity
- The top-k results above the score threshold are returned
- Retrieved text chunks are formatted and appended to the agent's system prompt or task context
- The agent generates its response with the retrieved context available
Theoretical Basis
Semantic Retrieval implements the Retrieval-Augmented Generation (RAG) pattern, which was introduced to address limitations of large language models:
- Knowledge cutoff -- LLMs have a training data cutoff and cannot access information published after that date. RAG provides up-to-date context.
- Hallucination -- LLMs can generate plausible but incorrect information. RAG grounds responses in specific, verified documents.
- Domain specificity -- General-purpose LLMs lack specialized domain knowledge. RAG injects domain-specific context into the prompt.
The retrieval mechanism relies on the dual-encoder architecture where:
- Documents are encoded into vectors at ingestion time (offline)
- Queries are encoded into vectors at search time (online)
- Relevance is measured by cosine similarity between query and document vectors
- Results are filtered by a score threshold to ensure minimum quality
Retrieval Parameters
| Parameter | Default | Description |
|---|---|---|
| results_limit | 5 | Maximum number of chunks to retrieve per query |
| score_threshold | 0.6 | Minimum cosine similarity score for inclusion |
| metadata_filter | None | Optional metadata-based filtering of results |
Automatic vs. Manual Retrieval
| Mode | Trigger | User Code Required |
|---|---|---|
| Automatic | Task execution with attached knowledge | None -- happens transparently |
| Manual | Calling knowledge.query() directly |
User writes query code |
| Crew API | Calling crew.query_knowledge() |
User writes query code |
In the automatic mode, the framework handles the entire retrieval-augmentation cycle. The user only needs to configure knowledge sources and attach them to a Crew or Agent.
Usage Context
Semantic Retrieval is the fifth step in the Knowledge RAG Pipeline:
- Select and configure knowledge sources (see Principle:CrewAIInc_CrewAI_Knowledge_Source_Selection)
- Configure the embedding provider (see Principle:CrewAIInc_CrewAI_Embedding_Configuration)
- Ingest sources into vector storage (see Principle:CrewAIInc_CrewAI_Knowledge_Ingestion)
- Attach knowledge to a Crew or Agent (see Principle:CrewAIInc_CrewAI_Knowledge_Attachment)
- Retrieve relevant chunks during task execution (this principle)
- Optionally query knowledge directly (see Principle:CrewAIInc_CrewAI_Direct_Knowledge_Querying)
Design Decisions
- Automatic retrieval -- By making retrieval automatic and transparent, the framework eliminates boilerplate code and ensures knowledge is always utilized when available.
- Score threshold filtering -- A minimum similarity score prevents irrelevant chunks from polluting the agent's context. The default of 0.6 provides a reasonable balance between recall and precision.
- Result limit -- Capping the number of returned chunks prevents context window overflow and focuses the agent on the most relevant information.
- Task-based querying -- Using the task description as the query ensures that retrieval is aligned with the agent's current objective.
Quality Considerations
The quality of semantic retrieval depends on several factors:
- Embedding model quality -- Higher-quality embedding models produce better vector representations, improving retrieval accuracy.
- Chunk size -- Smaller chunks provide more precise retrieval but may lack context; larger chunks provide more context but may include irrelevant information.
- Chunk overlap -- Overlap ensures that information spanning chunk boundaries is retrievable.
- Score threshold -- Too high a threshold may miss relevant results; too low may include noise.
- Query formulation -- The task description should be clear and specific to produce good retrieval results.
Example Scenario
An agent is tasked with answering a customer question about password reset. The automatic retrieval process:
from crewai import Crew, Agent, Task
from crewai.knowledge.source import PDFKnowledgeSource
# Knowledge is pre-configured and attached
crew = Crew(
agents=[support_agent],
tasks=[
Task(
description="Answer: How do I reset my password?",
expected_output="Step-by-step instructions",
agent=support_agent,
)
],
knowledge_sources=[
PDFKnowledgeSource(file_paths=["docs/user_guide.pdf"])
],
embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}},
)
# During kickoff(), the following happens automatically:
# 1. "How do I reset my password?" is embedded as a query vector
# 2. Vector store is searched for similar chunks
# 3. Top chunks (e.g., "To reset your password, go to Settings > Security...")
# are appended to the agent's prompt
# 4. The agent generates a grounded response using the retrieved context
result = crew.kickoff()
Related Pages
- Implementation:CrewAIInc_CrewAI_Knowledge_Storage_Search -- Concrete search implementation
- Principle:CrewAIInc_CrewAI_Knowledge_Attachment -- Previous step: attaching knowledge to crews/agents
- Principle:CrewAIInc_CrewAI_Direct_Knowledge_Querying -- Alternative: programmatic querying
- Principle:CrewAIInc_CrewAI_Embedding_Configuration -- Embedding model that powers the search
- Heuristic:CrewAIInc_CrewAI_RAG_Search_Defaults