Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:CrewAIInc CrewAI Direct Knowledge Querying

From Leeroopedia

Metadata

Field Value
Principle Name Direct Knowledge Querying
Workflow Knowledge_RAG_Pipeline
Category Programmatic Access
Repository crewAIInc/crewAI
Implemented By Implementation:CrewAIInc_CrewAI_Crew_Query_Knowledge

Overview

An explicit programmatic interface for querying a crew's knowledge base outside of the normal agent execution flow, enabling direct retrieval for debugging, testing, or custom application logic. Direct Knowledge Querying provides an escape hatch from the automatic retrieval mechanism, giving developers full control over when and how knowledge is accessed.

Description

Direct Knowledge Querying provides a way to query the crew's knowledge base without running an agent execution loop. This is useful for testing that knowledge was ingested correctly, building custom retrieval interfaces, or integrating knowledge search into application code that operates outside the crew/agent framework.

While the standard RAG pipeline (see Principle:CrewAIInc_CrewAI_Semantic_Retrieval) handles retrieval automatically during task execution, there are scenarios where direct programmatic access is essential:

  • Debugging ingestion -- Verifying that documents were chunked and embedded correctly by running test queries
  • Testing retrieval quality -- Evaluating whether the right chunks are returned for expected queries before deploying to production
  • Custom UIs -- Building search interfaces that expose the knowledge base to end users directly
  • Hybrid workflows -- Combining crew-based agent execution with direct knowledge queries in the same application
  • Pre-filtering -- Querying knowledge before task execution to determine whether sufficient context exists

Theoretical Basis

Direct Knowledge Querying provides direct API access to the underlying vector search, bypassing the agent reasoning layer. This follows the principle of separation of concerns:

  • The agent layer handles reasoning, planning, and response generation
  • The knowledge layer handles storage and retrieval of domain-specific information
  • Direct querying exposes the knowledge layer independently, without requiring the agent layer

This separation enables developers to interact with each layer independently, which is essential for testing, debugging, and building custom integrations.

Query Parameters

Parameter Type Default Description
query list[str] (required) One or more query strings to search for
results_limit int 3 Maximum number of results to return
score_threshold float 0.35 Minimum similarity score for inclusion

Note that the default score_threshold for direct querying (0.35) is lower than the default for automatic retrieval (0.6). This reflects the exploratory nature of direct queries, where users may want to see more results even at lower confidence levels.

Usage Context

Direct Knowledge Querying is an optional sixth step in the Knowledge RAG Pipeline, used when programmatic access is needed:

  1. Select and configure knowledge sources (see Principle:CrewAIInc_CrewAI_Knowledge_Source_Selection)
  2. Configure the embedding provider (see Principle:CrewAIInc_CrewAI_Embedding_Configuration)
  3. Ingest sources into vector storage (see Principle:CrewAIInc_CrewAI_Knowledge_Ingestion)
  4. Attach knowledge to a Crew or Agent (see Principle:CrewAIInc_CrewAI_Knowledge_Attachment)
  5. Retrieve relevant chunks during task execution (see Principle:CrewAIInc_CrewAI_Semantic_Retrieval)
  6. Optionally query knowledge directly (this principle)

Design Decisions

  • Crew-level method -- The query method is on the Crew class, not the Agent class, because knowledge is ultimately managed at the crew level even when agent-level sources exist.
  • Lower default threshold -- A lower score threshold (0.35 vs. 0.6) reflects the exploratory use case where developers want to see what is available rather than only the most confident matches.
  • Optional return type -- The method returns None if no knowledge is configured, making it safe to call on any crew regardless of knowledge configuration.
  • Same query interface -- Uses the same list[str] query format as the internal retrieval, ensuring consistency between direct and automatic queries.

Example Scenarios

Debugging Ingestion

from crewai import Crew
from crewai.knowledge.source import PDFKnowledgeSource

crew = Crew(
    agents=[...],
    tasks=[...],
    knowledge_sources=[
        PDFKnowledgeSource(file_paths=["docs/manual.pdf"])
    ],
    embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}},
)

# Verify ingestion worked correctly
results = crew.query_knowledge(["password reset"])
if results:
    for r in results:
        print(f"[{r.score:.3f}] {r.content[:100]}")
else:
    print("No results found -- check ingestion")

Building a Search Interface

# In a web application endpoint
def search_knowledge(user_query: str):
    results = crew.query_knowledge(
        query=[user_query],
        results_limit=10,
        score_threshold=0.3,
    )
    return [{"content": r.content, "score": r.score} for r in (results or [])]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment