Principle:CrewAIInc CrewAI Direct Knowledge Querying

Metadata

Field	Value
Principle Name	Direct Knowledge Querying
Workflow	Knowledge_RAG_Pipeline
Category	Programmatic Access
Repository	crewAIInc/crewAI
Implemented By	Implementation:CrewAIInc_CrewAI_Crew_Query_Knowledge

Overview

An explicit programmatic interface for querying a crew's knowledge base outside of the normal agent execution flow, enabling direct retrieval for debugging, testing, or custom application logic. Direct Knowledge Querying provides an escape hatch from the automatic retrieval mechanism, giving developers full control over when and how knowledge is accessed.

Description

Direct Knowledge Querying provides a way to query the crew's knowledge base without running an agent execution loop. This is useful for testing that knowledge was ingested correctly, building custom retrieval interfaces, or integrating knowledge search into application code that operates outside the crew/agent framework.

While the standard RAG pipeline (see Principle:CrewAIInc_CrewAI_Semantic_Retrieval) handles retrieval automatically during task execution, there are scenarios where direct programmatic access is essential:

Debugging ingestion -- Verifying that documents were chunked and embedded correctly by running test queries
Testing retrieval quality -- Evaluating whether the right chunks are returned for expected queries before deploying to production
Custom UIs -- Building search interfaces that expose the knowledge base to end users directly
Hybrid workflows -- Combining crew-based agent execution with direct knowledge queries in the same application
Pre-filtering -- Querying knowledge before task execution to determine whether sufficient context exists

Theoretical Basis

Direct Knowledge Querying provides direct API access to the underlying vector search, bypassing the agent reasoning layer. This follows the principle of separation of concerns:

The agent layer handles reasoning, planning, and response generation
The knowledge layer handles storage and retrieval of domain-specific information
Direct querying exposes the knowledge layer independently, without requiring the agent layer

This separation enables developers to interact with each layer independently, which is essential for testing, debugging, and building custom integrations.

Query Parameters

Parameter	Type	Default	Description
query	`list[str]`	(required)	One or more query strings to search for
results_limit	`int`	3	Maximum number of results to return
score_threshold	`float`	0.35	Minimum similarity score for inclusion

Note that the default score_threshold for direct querying (0.35) is lower than the default for automatic retrieval (0.6). This reflects the exploratory nature of direct queries, where users may want to see more results even at lower confidence levels.

Usage Context

Direct Knowledge Querying is an optional sixth step in the Knowledge RAG Pipeline, used when programmatic access is needed:

Select and configure knowledge sources (see Principle:CrewAIInc_CrewAI_Knowledge_Source_Selection)
Configure the embedding provider (see Principle:CrewAIInc_CrewAI_Embedding_Configuration)
Ingest sources into vector storage (see Principle:CrewAIInc_CrewAI_Knowledge_Ingestion)
Attach knowledge to a Crew or Agent (see Principle:CrewAIInc_CrewAI_Knowledge_Attachment)
Retrieve relevant chunks during task execution (see Principle:CrewAIInc_CrewAI_Semantic_Retrieval)
Optionally query knowledge directly (this principle)

Design Decisions

Crew-level method -- The query method is on the Crew class, not the Agent class, because knowledge is ultimately managed at the crew level even when agent-level sources exist.
Lower default threshold -- A lower score threshold (0.35 vs. 0.6) reflects the exploratory use case where developers want to see what is available rather than only the most confident matches.
Optional return type -- The method returns None if no knowledge is configured, making it safe to call on any crew regardless of knowledge configuration.
Same query interface -- Uses the same list[str] query format as the internal retrieval, ensuring consistency between direct and automatic queries.

Example Scenarios

Debugging Ingestion

from crewai import Crew
from crewai.knowledge.source import PDFKnowledgeSource

crew = Crew(
    agents=[...],
    tasks=[...],
    knowledge_sources=[
        PDFKnowledgeSource(file_paths=["docs/manual.pdf"])
    ],
    embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}},
)

# Verify ingestion worked correctly
results = crew.query_knowledge(["password reset"])
if results:
    for r in results:
        print(f"[{r.score:.3f}] {r.content[:100]}")
else:
    print("No results found -- check ingestion")

Building a Search Interface

# In a web application endpoint
def search_knowledge(user_query: str):
    results = crew.query_knowledge(
        query=[user_query],
        results_limit=10,
        score_threshold=0.3,
    )
    return [{"content": r.content, "score": r.score} for r in (results or [])]

Related Pages

Implementation:CrewAIInc_CrewAI_Crew_Query_Knowledge -- Concrete method implementation
Principle:CrewAIInc_CrewAI_Semantic_Retrieval -- Automatic retrieval during task execution
Principle:CrewAIInc_CrewAI_Knowledge_Attachment -- Knowledge attachment that enables querying

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment