Principle:Run llama Llama index Agent Execution
Overview
Agent Execution describes the runtime behavior of a ReAct agent -- the multi-step loop where the agent reasons, acts, observes, and repeats until it arrives at a final answer. Understanding the execution loop is essential for debugging agent behavior, handling streaming output, controlling iteration limits, and building reliable agentic applications.
AI Agents ReAct Workflow Execution LlamaIndex
The ReAct Execution Loop
The ReAct agent executes as a LlamaIndex Workflow -- an event-driven state machine where each step processes an event and emits the next. The full execution loop proceeds as follows:
Phase 1: Initialization
When the user calls agent.run("user question"), the workflow starts:
- The user message is converted to a
ChatMessageand stored in memory - The workflow context is initialized (memory, state, iteration counters)
- An
AgentInputevent is emitted containing the full chat history
Phase 2: Agent Setup
- The system prompt (if any) is prepended to the LLM input
- The agent's state (if any) is injected into the last user message
- An
AgentSetupevent is emitted, triggering the main reasoning step
Phase 3: Reasoning Step (take_step)
This is the core of the ReAct loop. For each iteration:
- The ReActChatFormatter formats the tools, chat history, and current reasoning chain into the LLM prompt
- The LLM generates a response (streaming or non-streaming)
- The ReActOutputParser parses the response into a structured reasoning step:
- ActionReasoningStep -- The LLM wants to call a tool (contains
actionandaction_input) - ResponseReasoningStep -- The LLM has a final answer (contains the response text)
- ActionReasoningStep -- The LLM wants to call a tool (contains
- The reasoning step is appended to the reasoning chain in the context store
Phase 4: Tool Execution
If the reasoning step is an action:
- A
ToolCallevent is dispatched for each tool call - The tool is looked up by name, invoked with the parsed arguments, and returns a
ToolOutput - An
ObservationReasoningStepis appended to the reasoning chain - The loop returns to Phase 3 for the next reasoning step
Phase 5: Finalization
When the LLM produces a final answer (or the iteration limit is reached):
- The full reasoning chain is serialized and stored in memory
- The "Answer:" prefix is stripped from the response text
- The reasoning chain is cleared for the next invocation
- The final
AgentOutputis returned
Iteration Flow Diagram
The execution follows this pattern:
User Message
|
v
[init_run] --> AgentInput
|
v
[setup_agent] --> AgentSetup
|
v
[run_agent_step] --> AgentOutput
|
v
[parse_agent_output]
|
+--> Has tool_calls? --> [call_tool] --> ToolCallResult
| |
| v
| [aggregate_tool_results] --> AgentInput (loop back)
|
+--> Has retry_messages? --> AgentInput (loop back with error correction)
|
+--> Final answer? --> [finalize] --> StopEvent (done)
Streaming During Execution
When streaming=True (the default), the agent emits AgentStream events during each LLM call. These events contain:
- delta -- The incremental token produced by the LLM
- response -- The accumulated response text so far
- current_agent_name -- Which agent is currently generating
- thinking_delta -- Optional thinking/reasoning tokens (for models that support it)
Streaming events are written to the workflow's event stream and can be consumed by the caller in real-time, enabling progressive display of the agent's reasoning.
Early Stopping Strategies
The agent enforces a maximum iteration count (default: 20) to prevent infinite loops. Two strategies are available when the limit is reached:
Force (Default)
Raises a WorkflowRuntimeError with a descriptive message. This is the safest option because it makes failures explicit:
# Default behavior -- raises an error at 20 iterations
agent = ReActAgent(tools=[...], llm=llm, early_stopping_method="force")
Generate
Makes one final LLM call with a special prompt asking the model to synthesize a response from the information gathered so far. This provides a "best effort" answer rather than an error:
# Graceful degradation -- generates a response at the limit
agent = ReActAgent(tools=[...], llm=llm, early_stopping_method="generate")
The generate prompt instructs the model: "You have reached the maximum number of iterations. Please provide your best answer based on the information gathered so far."
Error Recovery
The execution loop includes built-in error recovery for common failure modes:
- Empty LLM response -- If the LLM returns an empty message, a retry is triggered with an error message instructing the LLM to follow the Thought/Action/Answer format.
- Parse errors -- If the
ReActOutputParsercannot parse the LLM output, a retry is triggered with the parse error and format instructions, giving the LLM a chance to correct itself. - Tool not found -- If the LLM requests a tool that does not exist, a
ToolOutputwithis_error=Trueis returned, and the observation is added to the reasoning chain so the LLM can try a different tool. - Tool execution errors -- If a tool raises an exception, the error message is captured as the tool output and returned as an observation.
In all retry cases, the original (malformed) LLM message plus a correction prompt are appended to the input, and the reasoning step is retried.
Memory Management
The agent uses a ChatMemoryBuffer (or any BaseMemory implementation) to maintain conversation history across the execution loop:
- The user message is added at initialization
- The full reasoning chain (all thoughts, actions, observations) is serialized into a single assistant message at finalization
- This ensures that in follow-up queries, the agent has access to its prior reasoning
Knowledge Sources
ReAct: Synergizing Reasoning and Acting in Language Models LlamaIndex Agents Documentation LlamaIndex GitHub Repository