Principle:Run llama Llama index Agent Output Processing

Overview

Agent Output Processing covers how to interpret, access, and work with the results produced by a ReAct agent. After the agent completes its reasoning loop, the output contains not just the final response text but also the complete record of tool calls, reasoning steps, and source information. Understanding the output structure is essential for building applications that display agent reasoning traces, cite sources, and handle streaming versus non-streaming execution patterns.

AI Agents Output Processing Streaming LlamaIndex

Output Structure

A ReAct agent's execution produces an AgentOutput object. This is the central data structure that encapsulates everything that happened during the agent's run:

Response -- The final text answer as a ChatMessage
Tool Calls -- A list of all tool invocations made during execution (both ToolSelection and ToolCallResult objects)
Structured Response -- Optional structured output if the agent was configured with output_cls or structured_output_fn
Raw LLM Output -- The raw response object from the underlying LLM provider
Current Agent Name -- Which agent produced this output (relevant in multi-agent systems)

The Reasoning Trace

The ReAct agent maintains an internal reasoning trace during execution -- a sequence of reasoning steps that capture the full chain of thought:

Step Type	Description	Key Fields
ActionReasoningStep	The agent decided to call a tool	`thought`, `action` (tool name), `action_input` (tool arguments)
ObservationReasoningStep	A tool returned its result	`observation` (tool output text), `return_direct` flag
ResponseReasoningStep	The agent produced a final answer	`thought`, `response` (answer text)

At finalization, the entire reasoning trace is serialized into a single assistant message and stored in memory. This means the trace is available in subsequent conversation turns but is collapsed into a single message to conserve context window tokens.

Tool Call Results

Each tool invocation during execution produces a ToolCallResult containing:

tool_name -- The name of the tool that was called
tool_kwargs -- The arguments passed to the tool
tool_id -- A unique identifier for this specific invocation
tool_output -- The ToolOutput object with the tool's response
return_direct -- Whether this tool's output should be returned directly to the user

The tool output itself contains:

content -- String representation of the result
blocks -- List of content blocks (text, images, audio)
raw_input -- The original input arguments
raw_output -- The unprocessed return value
is_error -- Whether the tool call failed

Streaming vs Non-Streaming

Non-Streaming

In non-streaming mode, the caller simply awaits the final result:

handler = agent.run("What is the answer?")
result = await handler  # Blocks until complete, returns AgentOutput
print(result)           # Prints the response text

The response is only available after the entire ReAct loop completes. This is simpler but provides no intermediate feedback.

Streaming

In streaming mode (the default), the agent emits events throughout execution that can be consumed in real-time:

handler = agent.run("Analyze this data...")
async for event in handler.stream_events():
    if isinstance(event, AgentStream):
        # Token-by-token LLM output
        print(event.delta, end="")
    elif isinstance(event, ToolCall):
        # Agent is about to call a tool
        print(f"\nCalling tool: {event.tool_name}")
    elif isinstance(event, ToolCallResult):
        # Tool returned a result
        print(f"\nTool result: {event.tool_output.content[:100]}")
    elif isinstance(event, AgentOutput):
        # A reasoning step completed
        pass
result = await handler

The key streaming events are:

Event Type	When Emitted	Key Fields
AgentStream	During LLM token generation	`delta` (new token), `response` (accumulated text), `thinking_delta`
AgentInput	Before each LLM call	`input` (messages), `current_agent_name`
ToolCall	When a tool is about to execute	`tool_name`, `tool_kwargs`, `tool_id`
ToolCallResult	After a tool completes	`tool_name`, `tool_output`, `return_direct`
AgentOutput	After each reasoning step	`response`, `tool_calls`, `raw`
AgentStreamStructuredOutput	When structured output is generated	`output` (dict)

Structured Output

The agent can optionally produce structured output in addition to the text response. This is configured via two mechanisms:

output_cls

A Pydantic BaseModel subclass. After the agent finishes, an additional LLM call generates a JSON response conforming to the schema:

from pydantic import BaseModel

class AnalysisResult(BaseModel):
    summary: str
    confidence: float
    sources: list[str]

agent = ReActAgent(tools=[...], llm=llm, output_cls=AnalysisResult)
result = await agent.run("Analyze the quarterly report.")
structured = result.get_pydantic_model(AnalysisResult)
print(structured.summary)

structured_output_fn

A custom function that takes the full chat history and returns a dictionary. This gives complete control over how structured output is derived:

def extract_output(messages: list[ChatMessage]) -> dict:
    last_msg = messages[-1].content
    return {"answer": last_msg, "num_messages": len(messages)}

agent = ReActAgent(tools=[...], llm=llm, structured_output_fn=extract_output)

Error Handling in Outputs

When processing agent outputs, it is important to check for error conditions:

Tool errors -- Check tool_output.is_error on individual tool call results
Parse failures -- The agent internally retries, but if all retries fail, the output may contain the raw unparsed text
Max iterations -- With early_stopping_method="generate", the response will be a "best effort" synthesis rather than a precise answer
Structured output failures -- get_pydantic_model() may return None if validation fails, with a warning emitted

Knowledge Sources

LlamaIndex Agents Documentation LlamaIndex GitHub Repository

Implementation

Implementation:Run_llama_Llama_index_AgentOutput_Processing

2026-02-11 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment