Principle:CrewAIInc CrewAI Semantic Retrieval

Metadata

Field	Value
Principle Name	Semantic Retrieval
Workflow	Knowledge_RAG_Pipeline
Category	Information Retrieval
Repository	crewAIInc/crewAI
Implemented By	Implementation:CrewAIInc_CrewAI_Knowledge_Storage_Search

Overview

An automatic retrieval mechanism that searches the vector store for relevant knowledge chunks during task execution, augmenting the agent's prompt with contextual information. Semantic Retrieval is the core "R" in RAG (Retrieval-Augmented Generation), enabling agents to ground their responses in specific domain knowledge.

Description

Semantic Retrieval is the "R" in RAG. During task execution, the agent's task description is used to generate a search query against the knowledge vector store. The most relevant chunks (filtered by score threshold and result limit) are appended to the agent's prompt as additional context. This happens automatically when knowledge is attached to the crew or agent; no explicit code is needed from the user.

The retrieval process works as follows:

The agent receives a task to execute
The task description (and optionally the expected output) is used as the search query
The query is embedded using the same embedding model that was used during ingestion
The embedded query is compared against all stored document vectors using cosine similarity
The top-k results above the score threshold are returned
Retrieved text chunks are formatted and appended to the agent's system prompt or task context
The agent generates its response with the retrieved context available

Theoretical Basis

Semantic Retrieval implements the Retrieval-Augmented Generation (RAG) pattern, which was introduced to address limitations of large language models:

Knowledge cutoff -- LLMs have a training data cutoff and cannot access information published after that date. RAG provides up-to-date context.
Hallucination -- LLMs can generate plausible but incorrect information. RAG grounds responses in specific, verified documents.
Domain specificity -- General-purpose LLMs lack specialized domain knowledge. RAG injects domain-specific context into the prompt.

The retrieval mechanism relies on the dual-encoder architecture where:

Documents are encoded into vectors at ingestion time (offline)
Queries are encoded into vectors at search time (online)
Relevance is measured by cosine similarity between query and document vectors
Results are filtered by a score threshold to ensure minimum quality

Retrieval Parameters

Parameter	Default	Description
results_limit	5	Maximum number of chunks to retrieve per query
score_threshold	0.6	Minimum cosine similarity score for inclusion
metadata_filter	None	Optional metadata-based filtering of results

Automatic vs. Manual Retrieval

Mode	Trigger	User Code Required
Automatic	Task execution with attached knowledge	None -- happens transparently
Manual	Calling `knowledge.query()` directly	User writes query code
Crew API	Calling `crew.query_knowledge()`	User writes query code

In the automatic mode, the framework handles the entire retrieval-augmentation cycle. The user only needs to configure knowledge sources and attach them to a Crew or Agent.

Usage Context

Semantic Retrieval is the fifth step in the Knowledge RAG Pipeline:

Select and configure knowledge sources (see Principle:CrewAIInc_CrewAI_Knowledge_Source_Selection)
Configure the embedding provider (see Principle:CrewAIInc_CrewAI_Embedding_Configuration)
Ingest sources into vector storage (see Principle:CrewAIInc_CrewAI_Knowledge_Ingestion)
Attach knowledge to a Crew or Agent (see Principle:CrewAIInc_CrewAI_Knowledge_Attachment)
Retrieve relevant chunks during task execution (this principle)
Optionally query knowledge directly (see Principle:CrewAIInc_CrewAI_Direct_Knowledge_Querying)

Design Decisions

Automatic retrieval -- By making retrieval automatic and transparent, the framework eliminates boilerplate code and ensures knowledge is always utilized when available.
Score threshold filtering -- A minimum similarity score prevents irrelevant chunks from polluting the agent's context. The default of 0.6 provides a reasonable balance between recall and precision.
Result limit -- Capping the number of returned chunks prevents context window overflow and focuses the agent on the most relevant information.
Task-based querying -- Using the task description as the query ensures that retrieval is aligned with the agent's current objective.

Quality Considerations

The quality of semantic retrieval depends on several factors:

Embedding model quality -- Higher-quality embedding models produce better vector representations, improving retrieval accuracy.
Chunk size -- Smaller chunks provide more precise retrieval but may lack context; larger chunks provide more context but may include irrelevant information.
Chunk overlap -- Overlap ensures that information spanning chunk boundaries is retrievable.
Score threshold -- Too high a threshold may miss relevant results; too low may include noise.
Query formulation -- The task description should be clear and specific to produce good retrieval results.

Example Scenario

An agent is tasked with answering a customer question about password reset. The automatic retrieval process:

from crewai import Crew, Agent, Task
from crewai.knowledge.source import PDFKnowledgeSource

# Knowledge is pre-configured and attached
crew = Crew(
    agents=[support_agent],
    tasks=[
        Task(
            description="Answer: How do I reset my password?",
            expected_output="Step-by-step instructions",
            agent=support_agent,
        )
    ],
    knowledge_sources=[
        PDFKnowledgeSource(file_paths=["docs/user_guide.pdf"])
    ],
    embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}},
)

# During kickoff(), the following happens automatically:
# 1. "How do I reset my password?" is embedded as a query vector
# 2. Vector store is searched for similar chunks
# 3. Top chunks (e.g., "To reset your password, go to Settings > Security...")
#    are appended to the agent's prompt
# 4. The agent generates a grounded response using the retrieved context
result = crew.kickoff()

Related Pages

Implementation:CrewAIInc_CrewAI_Knowledge_Storage_Search -- Concrete search implementation
Principle:CrewAIInc_CrewAI_Knowledge_Attachment -- Previous step: attaching knowledge to crews/agents
Principle:CrewAIInc_CrewAI_Direct_Knowledge_Querying -- Alternative: programmatic querying
Principle:CrewAIInc_CrewAI_Embedding_Configuration -- Embedding model that powers the search
Heuristic:CrewAIInc_CrewAI_RAG_Search_Defaults

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment