Principle:Run llama Llama index Agent Output Processing
Overview
Agent Output Processing covers how to interpret, access, and work with the results produced by a ReAct agent. After the agent completes its reasoning loop, the output contains not just the final response text but also the complete record of tool calls, reasoning steps, and source information. Understanding the output structure is essential for building applications that display agent reasoning traces, cite sources, and handle streaming versus non-streaming execution patterns.
AI Agents Output Processing Streaming LlamaIndex
Output Structure
A ReAct agent's execution produces an AgentOutput object. This is the central data structure that encapsulates everything that happened during the agent's run:
- Response -- The final text answer as a
ChatMessage - Tool Calls -- A list of all tool invocations made during execution (both
ToolSelectionandToolCallResultobjects) - Structured Response -- Optional structured output if the agent was configured with
output_clsorstructured_output_fn - Raw LLM Output -- The raw response object from the underlying LLM provider
- Current Agent Name -- Which agent produced this output (relevant in multi-agent systems)
The Reasoning Trace
The ReAct agent maintains an internal reasoning trace during execution -- a sequence of reasoning steps that capture the full chain of thought:
| Step Type | Description | Key Fields |
|---|---|---|
| ActionReasoningStep | The agent decided to call a tool | thought, action (tool name), action_input (tool arguments)
|
| ObservationReasoningStep | A tool returned its result | observation (tool output text), return_direct flag
|
| ResponseReasoningStep | The agent produced a final answer | thought, response (answer text)
|
At finalization, the entire reasoning trace is serialized into a single assistant message and stored in memory. This means the trace is available in subsequent conversation turns but is collapsed into a single message to conserve context window tokens.
Tool Call Results
Each tool invocation during execution produces a ToolCallResult containing:
- tool_name -- The name of the tool that was called
- tool_kwargs -- The arguments passed to the tool
- tool_id -- A unique identifier for this specific invocation
- tool_output -- The
ToolOutputobject with the tool's response - return_direct -- Whether this tool's output should be returned directly to the user
The tool output itself contains:
- content -- String representation of the result
- blocks -- List of content blocks (text, images, audio)
- raw_input -- The original input arguments
- raw_output -- The unprocessed return value
- is_error -- Whether the tool call failed
Streaming vs Non-Streaming
Non-Streaming
In non-streaming mode, the caller simply awaits the final result:
handler = agent.run("What is the answer?")
result = await handler # Blocks until complete, returns AgentOutput
print(result) # Prints the response text
The response is only available after the entire ReAct loop completes. This is simpler but provides no intermediate feedback.
Streaming
In streaming mode (the default), the agent emits events throughout execution that can be consumed in real-time:
handler = agent.run("Analyze this data...")
async for event in handler.stream_events():
if isinstance(event, AgentStream):
# Token-by-token LLM output
print(event.delta, end="")
elif isinstance(event, ToolCall):
# Agent is about to call a tool
print(f"\nCalling tool: {event.tool_name}")
elif isinstance(event, ToolCallResult):
# Tool returned a result
print(f"\nTool result: {event.tool_output.content[:100]}")
elif isinstance(event, AgentOutput):
# A reasoning step completed
pass
result = await handler
The key streaming events are:
| Event Type | When Emitted | Key Fields |
|---|---|---|
| AgentStream | During LLM token generation | delta (new token), response (accumulated text), thinking_delta
|
| AgentInput | Before each LLM call | input (messages), current_agent_name
|
| ToolCall | When a tool is about to execute | tool_name, tool_kwargs, tool_id
|
| ToolCallResult | After a tool completes | tool_name, tool_output, return_direct
|
| AgentOutput | After each reasoning step | response, tool_calls, raw
|
| AgentStreamStructuredOutput | When structured output is generated | output (dict)
|
Structured Output
The agent can optionally produce structured output in addition to the text response. This is configured via two mechanisms:
output_cls
A Pydantic BaseModel subclass. After the agent finishes, an additional LLM call generates a JSON response conforming to the schema:
from pydantic import BaseModel
class AnalysisResult(BaseModel):
summary: str
confidence: float
sources: list[str]
agent = ReActAgent(tools=[...], llm=llm, output_cls=AnalysisResult)
result = await agent.run("Analyze the quarterly report.")
structured = result.get_pydantic_model(AnalysisResult)
print(structured.summary)
structured_output_fn
A custom function that takes the full chat history and returns a dictionary. This gives complete control over how structured output is derived:
def extract_output(messages: list[ChatMessage]) -> dict:
last_msg = messages[-1].content
return {"answer": last_msg, "num_messages": len(messages)}
agent = ReActAgent(tools=[...], llm=llm, structured_output_fn=extract_output)
Error Handling in Outputs
When processing agent outputs, it is important to check for error conditions:
- Tool errors -- Check
tool_output.is_erroron individual tool call results - Parse failures -- The agent internally retries, but if all retries fail, the output may contain the raw unparsed text
- Max iterations -- With
early_stopping_method="generate", the response will be a "best effort" synthesis rather than a precise answer - Structured output failures --
get_pydantic_model()may returnNoneif validation fails, with a warning emitted
Knowledge Sources
LlamaIndex Agents Documentation LlamaIndex GitHub Repository