Overview
Concrete methods on BaseQueryEngine that execute queries against indexed knowledge, supporting synchronous, asynchronous, and decomposed (retrieve + synthesize) execution paths.
Description
The query method is the primary entry point for executing a RAG query. It accepts either a plain string or a QueryBundle object, instruments the call with callbacks, and delegates to the subclass-specific _query implementation. The aquery method provides the same functionality in an async context. For fine-grained control, retrieve and synthesize can be called independently, allowing inspection or modification of nodes between the two phases.
Usage
Call query() on any query engine instance with a natural language question. Access the returned Response object's .response for the answer text and .source_nodes for the provenance chain of contributing document chunks.
Code Reference
Source Location
- Repository: run-llama/llama_index
- File: llama-index-core/llama_index/core/base/base_query_engine.py
- Lines: L38-60 (query and aquery methods)
Signature
class BaseQueryEngine(ChainableMixin, PromptMixin):
def query(
self,
str_or_query_bundle: QueryType,
) -> RESPONSE_TYPE:
# Dispatches callbacks, converts str to QueryBundle if needed,
# delegates to self._query(query_bundle)
...
async def aquery(
self,
str_or_query_bundle: QueryType,
) -> RESPONSE_TYPE:
# Async equivalent of query()
# Delegates to self._aquery(query_bundle)
...
# Additional decomposed methods on RetrieverQueryEngine
class RetrieverQueryEngine(BaseQueryEngine):
def retrieve(
self,
query_bundle: QueryBundle,
) -> List[NodeWithScore]:
# Retrieves relevant nodes from the index
...
def synthesize(
self,
query_bundle: QueryBundle,
nodes: List[NodeWithScore],
additional_source_nodes: Optional[Sequence[NodeWithScore]] = None,
) -> RESPONSE_TYPE:
# Synthesizes a response from retrieved nodes using the LLM
...
Import
# query is a method on any query engine instance
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")
I/O Contract
Inputs (query / aquery)
| Name |
Type |
Required |
Description
|
| str_or_query_bundle |
QueryType (Union[str, QueryBundle]) |
Yes |
The user's question as a plain string or a QueryBundle with additional metadata
|
Inputs (retrieve)
| Name |
Type |
Required |
Description
|
| query_bundle |
QueryBundle |
Yes |
The query to retrieve relevant nodes for
|
Inputs (synthesize)
| Name |
Type |
Required |
Description
|
| query_bundle |
QueryBundle |
Yes |
The original query for context in synthesis
|
| nodes |
List[NodeWithScore] |
Yes |
Retrieved nodes to use as context for the LLM
|
| additional_source_nodes |
Sequence[NodeWithScore] or None |
No |
Extra source nodes to include in the response metadata
|
Outputs
| Name |
Type |
Description
|
| response |
RESPONSE_TYPE |
Response object containing generated text and source attribution
|
| response.response |
str |
The synthesized answer text from the LLM
|
| response.source_nodes |
List[NodeWithScore] |
List of nodes (with scores) that contributed to the answer
|
| response.metadata |
dict |
Additional metadata from the synthesis process
|
Usage Examples
Basic Query
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
# Execute a query
response = query_engine.query("What are the key findings?")
print(response.response)
# Inspect source nodes for attribution
for node in response.source_nodes:
print(f"Score: {node.score}, Text: {node.text[:100]}...")
Async Query
import asyncio
async def ask_question():
query_engine = index.as_query_engine(use_async=True)
response = await query_engine.aquery("Summarize the document.")
return response.response
result = asyncio.run(ask_question())
Decomposed Retrieve and Synthesize
from llama_index.core import QueryBundle
query_engine = index.as_query_engine()
query_bundle = QueryBundle(query_str="What are the risks?")
# Phase 1: Retrieve nodes
nodes = query_engine.retrieve(query_bundle)
print(f"Retrieved {len(nodes)} nodes")
# Inspect or filter nodes manually
filtered_nodes = [n for n in nodes if n.score > 0.5]
# Phase 2: Synthesize from filtered nodes
response = query_engine.synthesize(query_bundle, filtered_nodes)
print(response.response)
Related Pages
Implements Principle