Workflow:Run llama Llama index ReAct Agent

Knowledge Sources	LlamaIndex LlamaIndex Docs
Domains	LLMs, Agents, Tool_Use
Last Updated	2026-02-11 19:00 GMT

Overview

End-to-end process for building a ReAct (Reasoning + Acting) agent that uses tools to solve multi-step tasks through an iterative thought-action-observation loop.

Description

This workflow creates a conversational agent using the ReAct pattern, where the LLM reasons about a task, selects and executes tools, observes the results, and iterates until it arrives at a final answer. The agent supports function tools (wrapping arbitrary Python functions), query engine tools (wrapping LlamaIndex indices), and retriever tools. It uses the LlamaIndex Workflow framework for orchestration, with built-in support for streaming, structured output, and conversation memory.

Usage

Execute this workflow when you need an LLM-powered agent that can interact with external tools, APIs, or data sources to solve complex multi-step tasks. This is appropriate when a single query-response cycle is insufficient and the agent needs to reason about which actions to take, execute them, and incorporate results into its reasoning.

Execution Steps

Step 1: Define Tools

Create the set of tools the agent can use. Tools are created by wrapping Python functions with FunctionTool, wrapping query engines with QueryEngineTool, or wrapping retrievers with RetrieverTool. Each tool has a name, description, and parameter schema that the LLM uses to decide when and how to invoke it.

Key considerations:

Function signatures are automatically converted to JSON schemas for the LLM
Tool descriptions are critical for the LLM to select the right tool
Tools with return_direct=True bypass further reasoning and return immediately
A ToolRetriever can dynamically select relevant tools from a large tool set

Step 2: Configure Agent

Create a ReActAgent instance with the tools, LLM, system prompt, and behavioral parameters. The agent initializes a ReActChatFormatter for prompt formatting and a ReActOutputParser for parsing the LLM output into thought, action, and answer components.

Key considerations:

max_iterations limits the thought-action-observation loop (default: 20)
streaming=True enables real-time token delivery
output_cls can be set to a Pydantic model for structured responses
The system prompt can include domain-specific instructions

Step 3: Run Agent

Execute the agent with a user message or chat history. The agent enters an iterative loop: it sends the conversation context to the LLM, parses the response for tool calls or a final answer, executes any tool calls, adds results to memory, and repeats until a final answer is produced or the iteration limit is reached.

Key considerations:

The agent maintains conversation history via ChatMemoryBuffer
Each iteration produces a thought (reasoning), action (tool call), and observation (result)
The agent can make multiple tool calls in a single iteration if the LLM supports it
Use handler.run() for async execution and handler.result() to get the final output

Step 4: Process Results

Extract the final answer and tool call history from the AgentOutput. The output includes the synthesized response, the list of tool calls made, and the raw LLM messages. Source nodes from query engine tools are accessible for attribution.

Key considerations:

AgentOutput contains response (ChatMessage), tool_calls, and raw messages
For streaming, iterate over the handler to receive token deltas
The memory persists across calls for multi-turn conversations

Execution Diagram

GitHub URL

Workflow Repository