Implementation:Run llama Llama index Query Engine Query

Knowledge Sources	LlamaIndex LlamaIndex Query Engine
Domains	RAG, LLM_Integration
Last Updated	2026-02-11 00:00 GMT

Overview

Concrete methods on BaseQueryEngine that execute queries against indexed knowledge, supporting synchronous, asynchronous, and decomposed (retrieve + synthesize) execution paths.

Description

The query method is the primary entry point for executing a RAG query. It accepts either a plain string or a QueryBundle object, instruments the call with callbacks, and delegates to the subclass-specific _query implementation. The aquery method provides the same functionality in an async context. For fine-grained control, retrieve and synthesize can be called independently, allowing inspection or modification of nodes between the two phases.

Usage

Call query() on any query engine instance with a natural language question. Access the returned Response object's .response for the answer text and .source_nodes for the provenance chain of contributing document chunks.

Code Reference

Source Location

Repository: run-llama/llama_index
File: llama-index-core/llama_index/core/base/base_query_engine.py
Lines: L38-60 (query and aquery methods)

Signature

class BaseQueryEngine(ChainableMixin, PromptMixin):
    def query(
        self,
        str_or_query_bundle: QueryType,
    ) -> RESPONSE_TYPE:
        # Dispatches callbacks, converts str to QueryBundle if needed,
        # delegates to self._query(query_bundle)
        ...

    async def aquery(
        self,
        str_or_query_bundle: QueryType,
    ) -> RESPONSE_TYPE:
        # Async equivalent of query()
        # Delegates to self._aquery(query_bundle)
        ...

# Additional decomposed methods on RetrieverQueryEngine
class RetrieverQueryEngine(BaseQueryEngine):
    def retrieve(
        self,
        query_bundle: QueryBundle,
    ) -> List[NodeWithScore]:
        # Retrieves relevant nodes from the index
        ...

    def synthesize(
        self,
        query_bundle: QueryBundle,
        nodes: List[NodeWithScore],
        additional_source_nodes: Optional[Sequence[NodeWithScore]] = None,
    ) -> RESPONSE_TYPE:
        # Synthesizes a response from retrieved nodes using the LLM
        ...

Import

# query is a method on any query engine instance
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")

I/O Contract

Inputs (query / aquery)

Name	Type	Required	Description
str_or_query_bundle	QueryType (Union[str, QueryBundle])	Yes	The user's question as a plain string or a QueryBundle with additional metadata

Inputs (retrieve)

Name	Type	Required	Description
query_bundle	QueryBundle	Yes	The query to retrieve relevant nodes for

Inputs (synthesize)

Name	Type	Required	Description
query_bundle	QueryBundle	Yes	The original query for context in synthesis
nodes	List[NodeWithScore]	Yes	Retrieved nodes to use as context for the LLM
additional_source_nodes	Sequence[NodeWithScore] or None	No	Extra source nodes to include in the response metadata

Outputs

Name	Type	Description
response	RESPONSE_TYPE	Response object containing generated text and source attribution
response.response	str	The synthesized answer text from the LLM
response.source_nodes	List[NodeWithScore]	List of nodes (with scores) that contributed to the answer
response.metadata	dict	Additional metadata from the synthesis process

Usage Examples

Basic Query

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Execute a query
response = query_engine.query("What are the key findings?")
print(response.response)

# Inspect source nodes for attribution
for node in response.source_nodes:
    print(f"Score: {node.score}, Text: {node.text[:100]}...")

Async Query

import asyncio

async def ask_question():
    query_engine = index.as_query_engine(use_async=True)
    response = await query_engine.aquery("Summarize the document.")
    return response.response

result = asyncio.run(ask_question())

Decomposed Retrieve and Synthesize

from llama_index.core import QueryBundle

query_engine = index.as_query_engine()
query_bundle = QueryBundle(query_str="What are the risks?")

# Phase 1: Retrieve nodes
nodes = query_engine.retrieve(query_bundle)
print(f"Retrieved {len(nodes)} nodes")

# Inspect or filter nodes manually
filtered_nodes = [n for n in nodes if n.score > 0.5]

# Phase 2: Synthesize from filtered nodes
response = query_engine.synthesize(query_bundle, filtered_nodes)
print(response.response)

Related Pages

Implements Principle

Principle:Run_llama_Llama_index_Query_Execution

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment