Workflow:Run llama Llama index ReAct Agent
| Knowledge Sources | |
|---|---|
| Domains | LLMs, Agents, Tool_Use |
| Last Updated | 2026-02-11 19:00 GMT |
Overview
End-to-end process for building a ReAct (Reasoning + Acting) agent that uses tools to solve multi-step tasks through an iterative thought-action-observation loop.
Description
This workflow creates a conversational agent using the ReAct pattern, where the LLM reasons about a task, selects and executes tools, observes the results, and iterates until it arrives at a final answer. The agent supports function tools (wrapping arbitrary Python functions), query engine tools (wrapping LlamaIndex indices), and retriever tools. It uses the LlamaIndex Workflow framework for orchestration, with built-in support for streaming, structured output, and conversation memory.
Usage
Execute this workflow when you need an LLM-powered agent that can interact with external tools, APIs, or data sources to solve complex multi-step tasks. This is appropriate when a single query-response cycle is insufficient and the agent needs to reason about which actions to take, execute them, and incorporate results into its reasoning.
Execution Steps
Step 1: Define Tools
Create the set of tools the agent can use. Tools are created by wrapping Python functions with FunctionTool, wrapping query engines with QueryEngineTool, or wrapping retrievers with RetrieverTool. Each tool has a name, description, and parameter schema that the LLM uses to decide when and how to invoke it.
Key considerations:
- Function signatures are automatically converted to JSON schemas for the LLM
- Tool descriptions are critical for the LLM to select the right tool
- Tools with return_direct=True bypass further reasoning and return immediately
- A ToolRetriever can dynamically select relevant tools from a large tool set
Step 2: Configure Agent
Create a ReActAgent instance with the tools, LLM, system prompt, and behavioral parameters. The agent initializes a ReActChatFormatter for prompt formatting and a ReActOutputParser for parsing the LLM output into thought, action, and answer components.
Key considerations:
- max_iterations limits the thought-action-observation loop (default: 20)
- streaming=True enables real-time token delivery
- output_cls can be set to a Pydantic model for structured responses
- The system prompt can include domain-specific instructions
Step 3: Run Agent
Execute the agent with a user message or chat history. The agent enters an iterative loop: it sends the conversation context to the LLM, parses the response for tool calls or a final answer, executes any tool calls, adds results to memory, and repeats until a final answer is produced or the iteration limit is reached.
Key considerations:
- The agent maintains conversation history via ChatMemoryBuffer
- Each iteration produces a thought (reasoning), action (tool call), and observation (result)
- The agent can make multiple tool calls in a single iteration if the LLM supports it
- Use handler.run() for async execution and handler.result() to get the final output
Step 4: Process Results
Extract the final answer and tool call history from the AgentOutput. The output includes the synthesized response, the list of tool calls made, and the raw LLM messages. Source nodes from query engine tools are accessible for attribution.
Key considerations:
- AgentOutput contains response (ChatMessage), tool_calls, and raw messages
- For streaming, iterate over the handler to receive token deltas
- The memory persists across calls for multi-turn conversations