Principle:CrewAIInc CrewAI Direct Knowledge Querying
Metadata
| Field | Value |
|---|---|
| Principle Name | Direct Knowledge Querying |
| Workflow | Knowledge_RAG_Pipeline |
| Category | Programmatic Access |
| Repository | crewAIInc/crewAI |
| Implemented By | Implementation:CrewAIInc_CrewAI_Crew_Query_Knowledge |
Overview
An explicit programmatic interface for querying a crew's knowledge base outside of the normal agent execution flow, enabling direct retrieval for debugging, testing, or custom application logic. Direct Knowledge Querying provides an escape hatch from the automatic retrieval mechanism, giving developers full control over when and how knowledge is accessed.
Description
Direct Knowledge Querying provides a way to query the crew's knowledge base without running an agent execution loop. This is useful for testing that knowledge was ingested correctly, building custom retrieval interfaces, or integrating knowledge search into application code that operates outside the crew/agent framework.
While the standard RAG pipeline (see Principle:CrewAIInc_CrewAI_Semantic_Retrieval) handles retrieval automatically during task execution, there are scenarios where direct programmatic access is essential:
- Debugging ingestion -- Verifying that documents were chunked and embedded correctly by running test queries
- Testing retrieval quality -- Evaluating whether the right chunks are returned for expected queries before deploying to production
- Custom UIs -- Building search interfaces that expose the knowledge base to end users directly
- Hybrid workflows -- Combining crew-based agent execution with direct knowledge queries in the same application
- Pre-filtering -- Querying knowledge before task execution to determine whether sufficient context exists
Theoretical Basis
Direct Knowledge Querying provides direct API access to the underlying vector search, bypassing the agent reasoning layer. This follows the principle of separation of concerns:
- The agent layer handles reasoning, planning, and response generation
- The knowledge layer handles storage and retrieval of domain-specific information
- Direct querying exposes the knowledge layer independently, without requiring the agent layer
This separation enables developers to interact with each layer independently, which is essential for testing, debugging, and building custom integrations.
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| query | list[str] |
(required) | One or more query strings to search for |
| results_limit | int |
3 | Maximum number of results to return |
| score_threshold | float |
0.35 | Minimum similarity score for inclusion |
Note that the default score_threshold for direct querying (0.35) is lower than the default for automatic retrieval (0.6). This reflects the exploratory nature of direct queries, where users may want to see more results even at lower confidence levels.
Usage Context
Direct Knowledge Querying is an optional sixth step in the Knowledge RAG Pipeline, used when programmatic access is needed:
- Select and configure knowledge sources (see Principle:CrewAIInc_CrewAI_Knowledge_Source_Selection)
- Configure the embedding provider (see Principle:CrewAIInc_CrewAI_Embedding_Configuration)
- Ingest sources into vector storage (see Principle:CrewAIInc_CrewAI_Knowledge_Ingestion)
- Attach knowledge to a Crew or Agent (see Principle:CrewAIInc_CrewAI_Knowledge_Attachment)
- Retrieve relevant chunks during task execution (see Principle:CrewAIInc_CrewAI_Semantic_Retrieval)
- Optionally query knowledge directly (this principle)
Design Decisions
- Crew-level method -- The query method is on the Crew class, not the Agent class, because knowledge is ultimately managed at the crew level even when agent-level sources exist.
- Lower default threshold -- A lower score threshold (0.35 vs. 0.6) reflects the exploratory use case where developers want to see what is available rather than only the most confident matches.
- Optional return type -- The method returns
Noneif no knowledge is configured, making it safe to call on any crew regardless of knowledge configuration. - Same query interface -- Uses the same
list[str]query format as the internal retrieval, ensuring consistency between direct and automatic queries.
Example Scenarios
Debugging Ingestion
from crewai import Crew
from crewai.knowledge.source import PDFKnowledgeSource
crew = Crew(
agents=[...],
tasks=[...],
knowledge_sources=[
PDFKnowledgeSource(file_paths=["docs/manual.pdf"])
],
embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}},
)
# Verify ingestion worked correctly
results = crew.query_knowledge(["password reset"])
if results:
for r in results:
print(f"[{r.score:.3f}] {r.content[:100]}")
else:
print("No results found -- check ingestion")
Building a Search Interface
# In a web application endpoint
def search_knowledge(user_query: str):
results = crew.query_knowledge(
query=[user_query],
results_limit=10,
score_threshold=0.3,
)
return [{"content": r.content, "score": r.score} for r in (results or [])]
Related Pages
- Implementation:CrewAIInc_CrewAI_Crew_Query_Knowledge -- Concrete method implementation
- Principle:CrewAIInc_CrewAI_Semantic_Retrieval -- Automatic retrieval during task execution
- Principle:CrewAIInc_CrewAI_Knowledge_Attachment -- Knowledge attachment that enables querying