Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:CrewAIInc CrewAI Semantic Retrieval

From Leeroopedia

Metadata

Field Value
Principle Name Semantic Retrieval
Workflow Knowledge_RAG_Pipeline
Category Information Retrieval
Repository crewAIInc/crewAI
Implemented By Implementation:CrewAIInc_CrewAI_Knowledge_Storage_Search

Overview

An automatic retrieval mechanism that searches the vector store for relevant knowledge chunks during task execution, augmenting the agent's prompt with contextual information. Semantic Retrieval is the core "R" in RAG (Retrieval-Augmented Generation), enabling agents to ground their responses in specific domain knowledge.

Description

Semantic Retrieval is the "R" in RAG. During task execution, the agent's task description is used to generate a search query against the knowledge vector store. The most relevant chunks (filtered by score threshold and result limit) are appended to the agent's prompt as additional context. This happens automatically when knowledge is attached to the crew or agent; no explicit code is needed from the user.

The retrieval process works as follows:

  1. The agent receives a task to execute
  2. The task description (and optionally the expected output) is used as the search query
  3. The query is embedded using the same embedding model that was used during ingestion
  4. The embedded query is compared against all stored document vectors using cosine similarity
  5. The top-k results above the score threshold are returned
  6. Retrieved text chunks are formatted and appended to the agent's system prompt or task context
  7. The agent generates its response with the retrieved context available

Theoretical Basis

Semantic Retrieval implements the Retrieval-Augmented Generation (RAG) pattern, which was introduced to address limitations of large language models:

  • Knowledge cutoff -- LLMs have a training data cutoff and cannot access information published after that date. RAG provides up-to-date context.
  • Hallucination -- LLMs can generate plausible but incorrect information. RAG grounds responses in specific, verified documents.
  • Domain specificity -- General-purpose LLMs lack specialized domain knowledge. RAG injects domain-specific context into the prompt.

The retrieval mechanism relies on the dual-encoder architecture where:

  • Documents are encoded into vectors at ingestion time (offline)
  • Queries are encoded into vectors at search time (online)
  • Relevance is measured by cosine similarity between query and document vectors
  • Results are filtered by a score threshold to ensure minimum quality

Retrieval Parameters

Parameter Default Description
results_limit 5 Maximum number of chunks to retrieve per query
score_threshold 0.6 Minimum cosine similarity score for inclusion
metadata_filter None Optional metadata-based filtering of results

Automatic vs. Manual Retrieval

Mode Trigger User Code Required
Automatic Task execution with attached knowledge None -- happens transparently
Manual Calling knowledge.query() directly User writes query code
Crew API Calling crew.query_knowledge() User writes query code

In the automatic mode, the framework handles the entire retrieval-augmentation cycle. The user only needs to configure knowledge sources and attach them to a Crew or Agent.

Usage Context

Semantic Retrieval is the fifth step in the Knowledge RAG Pipeline:

  1. Select and configure knowledge sources (see Principle:CrewAIInc_CrewAI_Knowledge_Source_Selection)
  2. Configure the embedding provider (see Principle:CrewAIInc_CrewAI_Embedding_Configuration)
  3. Ingest sources into vector storage (see Principle:CrewAIInc_CrewAI_Knowledge_Ingestion)
  4. Attach knowledge to a Crew or Agent (see Principle:CrewAIInc_CrewAI_Knowledge_Attachment)
  5. Retrieve relevant chunks during task execution (this principle)
  6. Optionally query knowledge directly (see Principle:CrewAIInc_CrewAI_Direct_Knowledge_Querying)

Design Decisions

  • Automatic retrieval -- By making retrieval automatic and transparent, the framework eliminates boilerplate code and ensures knowledge is always utilized when available.
  • Score threshold filtering -- A minimum similarity score prevents irrelevant chunks from polluting the agent's context. The default of 0.6 provides a reasonable balance between recall and precision.
  • Result limit -- Capping the number of returned chunks prevents context window overflow and focuses the agent on the most relevant information.
  • Task-based querying -- Using the task description as the query ensures that retrieval is aligned with the agent's current objective.

Quality Considerations

The quality of semantic retrieval depends on several factors:

  • Embedding model quality -- Higher-quality embedding models produce better vector representations, improving retrieval accuracy.
  • Chunk size -- Smaller chunks provide more precise retrieval but may lack context; larger chunks provide more context but may include irrelevant information.
  • Chunk overlap -- Overlap ensures that information spanning chunk boundaries is retrievable.
  • Score threshold -- Too high a threshold may miss relevant results; too low may include noise.
  • Query formulation -- The task description should be clear and specific to produce good retrieval results.

Example Scenario

An agent is tasked with answering a customer question about password reset. The automatic retrieval process:

from crewai import Crew, Agent, Task
from crewai.knowledge.source import PDFKnowledgeSource

# Knowledge is pre-configured and attached
crew = Crew(
    agents=[support_agent],
    tasks=[
        Task(
            description="Answer: How do I reset my password?",
            expected_output="Step-by-step instructions",
            agent=support_agent,
        )
    ],
    knowledge_sources=[
        PDFKnowledgeSource(file_paths=["docs/user_guide.pdf"])
    ],
    embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}},
)

# During kickoff(), the following happens automatically:
# 1. "How do I reset my password?" is embedded as a query vector
# 2. Vector store is searched for similar chunks
# 3. Top chunks (e.g., "To reset your password, go to Settings > Security...")
#    are appended to the agent's prompt
# 4. The agent generates a grounded response using the retrieved context
result = crew.kickoff()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment