Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Run llama Llama index Agent Execution

From Leeroopedia

Overview

Agent Execution describes the runtime behavior of a ReAct agent -- the multi-step loop where the agent reasons, acts, observes, and repeats until it arrives at a final answer. Understanding the execution loop is essential for debugging agent behavior, handling streaming output, controlling iteration limits, and building reliable agentic applications.

AI Agents ReAct Workflow Execution LlamaIndex

The ReAct Execution Loop

The ReAct agent executes as a LlamaIndex Workflow -- an event-driven state machine where each step processes an event and emits the next. The full execution loop proceeds as follows:

Phase 1: Initialization

When the user calls agent.run("user question"), the workflow starts:

  1. The user message is converted to a ChatMessage and stored in memory
  2. The workflow context is initialized (memory, state, iteration counters)
  3. An AgentInput event is emitted containing the full chat history

Phase 2: Agent Setup

  1. The system prompt (if any) is prepended to the LLM input
  2. The agent's state (if any) is injected into the last user message
  3. An AgentSetup event is emitted, triggering the main reasoning step

Phase 3: Reasoning Step (take_step)

This is the core of the ReAct loop. For each iteration:

  1. The ReActChatFormatter formats the tools, chat history, and current reasoning chain into the LLM prompt
  2. The LLM generates a response (streaming or non-streaming)
  3. The ReActOutputParser parses the response into a structured reasoning step:
    • ActionReasoningStep -- The LLM wants to call a tool (contains action and action_input)
    • ResponseReasoningStep -- The LLM has a final answer (contains the response text)
  4. The reasoning step is appended to the reasoning chain in the context store

Phase 4: Tool Execution

If the reasoning step is an action:

  1. A ToolCall event is dispatched for each tool call
  2. The tool is looked up by name, invoked with the parsed arguments, and returns a ToolOutput
  3. An ObservationReasoningStep is appended to the reasoning chain
  4. The loop returns to Phase 3 for the next reasoning step

Phase 5: Finalization

When the LLM produces a final answer (or the iteration limit is reached):

  1. The full reasoning chain is serialized and stored in memory
  2. The "Answer:" prefix is stripped from the response text
  3. The reasoning chain is cleared for the next invocation
  4. The final AgentOutput is returned

Iteration Flow Diagram

The execution follows this pattern:

User Message
    |
    v
[init_run] --> AgentInput
    |
    v
[setup_agent] --> AgentSetup
    |
    v
[run_agent_step] --> AgentOutput
    |
    v
[parse_agent_output]
    |
    +--> Has tool_calls? --> [call_tool] --> ToolCallResult
    |                            |
    |                            v
    |                    [aggregate_tool_results] --> AgentInput (loop back)
    |
    +--> Has retry_messages? --> AgentInput (loop back with error correction)
    |
    +--> Final answer? --> [finalize] --> StopEvent (done)

Streaming During Execution

When streaming=True (the default), the agent emits AgentStream events during each LLM call. These events contain:

  • delta -- The incremental token produced by the LLM
  • response -- The accumulated response text so far
  • current_agent_name -- Which agent is currently generating
  • thinking_delta -- Optional thinking/reasoning tokens (for models that support it)

Streaming events are written to the workflow's event stream and can be consumed by the caller in real-time, enabling progressive display of the agent's reasoning.

Early Stopping Strategies

The agent enforces a maximum iteration count (default: 20) to prevent infinite loops. Two strategies are available when the limit is reached:

Force (Default)

Raises a WorkflowRuntimeError with a descriptive message. This is the safest option because it makes failures explicit:

# Default behavior -- raises an error at 20 iterations
agent = ReActAgent(tools=[...], llm=llm, early_stopping_method="force")

Generate

Makes one final LLM call with a special prompt asking the model to synthesize a response from the information gathered so far. This provides a "best effort" answer rather than an error:

# Graceful degradation -- generates a response at the limit
agent = ReActAgent(tools=[...], llm=llm, early_stopping_method="generate")

The generate prompt instructs the model: "You have reached the maximum number of iterations. Please provide your best answer based on the information gathered so far."

Error Recovery

The execution loop includes built-in error recovery for common failure modes:

  • Empty LLM response -- If the LLM returns an empty message, a retry is triggered with an error message instructing the LLM to follow the Thought/Action/Answer format.
  • Parse errors -- If the ReActOutputParser cannot parse the LLM output, a retry is triggered with the parse error and format instructions, giving the LLM a chance to correct itself.
  • Tool not found -- If the LLM requests a tool that does not exist, a ToolOutput with is_error=True is returned, and the observation is added to the reasoning chain so the LLM can try a different tool.
  • Tool execution errors -- If a tool raises an exception, the error message is captured as the tool output and returned as an observation.

In all retry cases, the original (malformed) LLM message plus a correction prompt are appended to the input, and the reasoning step is retried.

Memory Management

The agent uses a ChatMemoryBuffer (or any BaseMemory implementation) to maintain conversation history across the execution loop:

  • The user message is added at initialization
  • The full reasoning chain (all thoughts, actions, observations) is serialized into a single assistant message at finalization
  • This ensures that in follow-up queries, the agent has access to its prior reasoning

Knowledge Sources

ReAct: Synergizing Reasoning and Acting in Language Models LlamaIndex Agents Documentation LlamaIndex GitHub Repository

Implementation

Implementation:Run_llama_Llama_index_ReActAgent_Run

2026-02-11 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment