Implementation:Run llama Llama index ReActAgent Run
Overview
BaseWorkflowAgent.run is the entry point for executing a ReAct agent. It accepts a user message, initializes the workflow context, and runs the full ReAct loop (Thought-Action-Observation cycles) until the agent produces a final answer or the iteration limit is reached. The method returns a WorkflowHandler that yields the final AgentOutput.
Principle:Run_llama_Llama_index_Agent_Execution
Source File
llama-index-core/llama_index/core/agent/workflow/base_agent.py, Lines 722-751
Method Signature
def run(
self,
user_msg: Optional[Union[str, ChatMessage]] = None,
chat_history: Optional[List[ChatMessage]] = None,
memory: Optional[BaseMemory] = None,
ctx: Optional[Context] = None,
max_iterations: Optional[int] = None,
early_stopping_method: Optional[Literal["force", "generate"]] = None,
start_event: Optional[AgentWorkflowStartEvent] = None,
**kwargs: Any,
) -> WorkflowHandler:
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| user_msg | Optional[Union[str, ChatMessage]] |
None |
The user's input message. Can be a plain string (auto-converted to ChatMessage) or a pre-built ChatMessage. Either this or chat_history must be provided.
|
| chat_history | Optional[List[ChatMessage]] |
None |
Pre-existing chat history to initialize the memory with. Used for multi-turn conversations. |
| memory | Optional[BaseMemory] |
None |
Custom memory instance. If not provided, a ChatMemoryBuffer is created using the agent's LLM.
|
| ctx | Optional[Context] |
None |
Pre-existing workflow context. If provided and running, enables human-in-the-loop (HITL) patterns. |
| max_iterations | Optional[int] |
None |
Maximum number of Thought-Action-Observation cycles. Defaults to 20 if not specified. |
| early_stopping_method | Optional[Literal["force", "generate"]] |
None |
Override for the agent's early stopping behavior. Falls back to the agent's configured value. |
| start_event | Optional[AgentWorkflowStartEvent] |
None |
Pre-built start event. If provided, all other parameters except ctx are ignored.
|
| **kwargs | Any |
-- | Additional keyword arguments passed through to the start event. |
Return Value
Returns a WorkflowHandler -- an async-compatible handler that manages the workflow execution. The handler can be used in two ways:
# Await the final result
handler = agent.run("What is the capital of France?")
result = await handler # Returns AgentOutput
# Stream events during execution
handler = agent.run("Analyze this data...")
async for event in handler.stream_events():
if isinstance(event, AgentStream):
print(event.delta, end="")
result = await handler
Internal Workflow Steps
The run method triggers a sequence of workflow steps, each decorated with @step:
Step 1: init_run
Source: base_agent.py, Lines 380-432
Processes the AgentWorkflowStartEvent and sets up the context:
@step
async def init_run(self, ctx: Context, ev: AgentWorkflowStartEvent) -> AgentInput:
await self._init_context(ctx, ev)
# Converts user_msg to ChatMessage if string
# Adds messages to memory
# Returns AgentInput with full message history
Step 2: setup_agent
Source: base_agent.py, Lines 434-460
Prepares the LLM input with system prompt and state:
@step
async def setup_agent(self, ctx: Context, ev: AgentInput) -> AgentSetup:
# Prepends system prompt
# Injects agent state into the last message
# Returns AgentSetup with formatted LLM input
Step 3: run_agent_step
Source: base_agent.py, Lines 462-477
Invokes the agent's take_step method (overridden by ReActAgent):
@step
async def run_agent_step(self, ctx: Context, ev: AgentSetup) -> AgentOutput:
tools = await self.get_tools(user_msg_str)
agent_output = await self.take_step(ctx, ev.input, tools, memory)
ctx.write_event_to_stream(agent_output)
return agent_output
Step 4: parse_agent_output
Source: base_agent.py, Lines 517-617
Routes the agent's output to the appropriate next step:
- If tool_calls are present: emits
ToolCallevents for each call - If retry_messages are present: loops back to
AgentInputwith correction context - If no tool calls (final answer): calls
finalize()and emitsStopEvent - Enforces max_iterations and applies early stopping
Step 5: call_tool
Source: base_agent.py, Lines 619-654
Executes a single tool call:
@step
async def call_tool(self, ctx: Context, ev: ToolCall) -> ToolCallResult:
# Looks up tool by name
# Injects Context if the tool requires it
# Calls tool.acall(**tool_kwargs)
# Returns ToolCallResult with output (or error)
Step 6: aggregate_tool_results
Source: base_agent.py, Lines 656-720
Collects all tool results and loops back to the reasoning step:
@step
async def aggregate_tool_results(self, ctx: Context, ev: ToolCallResult) -> Union[AgentInput, StopEvent, None]:
# Collects all ToolCallResult events
# Delegates to handle_tool_call_results (adds ObservationReasoningSteps)
# If return_direct, finalizes and stops
# Otherwise, returns AgentInput to continue the loop
ReActAgent.take_step
Source: react_agent.py, Lines 117-258
This is the ReAct-specific core logic within each iteration:
async def take_step(self, ctx, llm_input, tools, memory) -> AgentOutput:
# 1. Format input with ReActChatFormatter (tools + history + reasoning chain)
# 2. Call LLM (streaming or non-streaming)
# 3. Parse output with ReActOutputParser
# 4. Handle errors: empty response or parse failure -> retry
# 5. Append reasoning step to chain
# 6. If ResponseReasoningStep (is_done=True) -> return final AgentOutput
# 7. If ActionReasoningStep -> create ToolSelection and return AgentOutput with tool_calls
Usage Examples
Basic Execution
from llama_index.core.agent.workflow import ReActAgent
agent = ReActAgent(tools=[multiply_tool, search_tool], llm=llm)
# Simple run
handler = agent.run("What is 6 times 7?")
response = await handler
print(response) # "42"
Streaming Execution
handler = agent.run("Research the latest AI trends and summarize them.")
async for event in handler.stream_events():
if isinstance(event, AgentStream):
print(event.delta, end="", flush=True)
result = await handler
Multi-Turn Conversation
from llama_index.core.memory import ChatMemoryBuffer
memory = ChatMemoryBuffer.from_defaults(llm=llm)
# First turn
handler = agent.run("What is the population of France?", memory=memory)
result1 = await handler
# Second turn -- memory carries over
handler = agent.run("How does that compare to Germany?", memory=memory)
result2 = await handler
With Custom Iteration Limits
handler = agent.run(
"Solve this complex multi-step problem...",
max_iterations=50,
early_stopping_method="generate",
)
result = await handler
Import Statement
from llama_index.core.agent.workflow import ReActAgent
# The run() method is called on the agent instance:
# handler = agent.run("user message")
# result = await handler