Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Run llama Llama index Agent Output Processing

From Leeroopedia

Overview

Agent Output Processing covers how to interpret, access, and work with the results produced by a ReAct agent. After the agent completes its reasoning loop, the output contains not just the final response text but also the complete record of tool calls, reasoning steps, and source information. Understanding the output structure is essential for building applications that display agent reasoning traces, cite sources, and handle streaming versus non-streaming execution patterns.

AI Agents Output Processing Streaming LlamaIndex

Output Structure

A ReAct agent's execution produces an AgentOutput object. This is the central data structure that encapsulates everything that happened during the agent's run:

  • Response -- The final text answer as a ChatMessage
  • Tool Calls -- A list of all tool invocations made during execution (both ToolSelection and ToolCallResult objects)
  • Structured Response -- Optional structured output if the agent was configured with output_cls or structured_output_fn
  • Raw LLM Output -- The raw response object from the underlying LLM provider
  • Current Agent Name -- Which agent produced this output (relevant in multi-agent systems)

The Reasoning Trace

The ReAct agent maintains an internal reasoning trace during execution -- a sequence of reasoning steps that capture the full chain of thought:

Step Type Description Key Fields
ActionReasoningStep The agent decided to call a tool thought, action (tool name), action_input (tool arguments)
ObservationReasoningStep A tool returned its result observation (tool output text), return_direct flag
ResponseReasoningStep The agent produced a final answer thought, response (answer text)

At finalization, the entire reasoning trace is serialized into a single assistant message and stored in memory. This means the trace is available in subsequent conversation turns but is collapsed into a single message to conserve context window tokens.

Tool Call Results

Each tool invocation during execution produces a ToolCallResult containing:

  • tool_name -- The name of the tool that was called
  • tool_kwargs -- The arguments passed to the tool
  • tool_id -- A unique identifier for this specific invocation
  • tool_output -- The ToolOutput object with the tool's response
  • return_direct -- Whether this tool's output should be returned directly to the user

The tool output itself contains:

  • content -- String representation of the result
  • blocks -- List of content blocks (text, images, audio)
  • raw_input -- The original input arguments
  • raw_output -- The unprocessed return value
  • is_error -- Whether the tool call failed

Streaming vs Non-Streaming

Non-Streaming

In non-streaming mode, the caller simply awaits the final result:

handler = agent.run("What is the answer?")
result = await handler  # Blocks until complete, returns AgentOutput
print(result)           # Prints the response text

The response is only available after the entire ReAct loop completes. This is simpler but provides no intermediate feedback.

Streaming

In streaming mode (the default), the agent emits events throughout execution that can be consumed in real-time:

handler = agent.run("Analyze this data...")
async for event in handler.stream_events():
    if isinstance(event, AgentStream):
        # Token-by-token LLM output
        print(event.delta, end="")
    elif isinstance(event, ToolCall):
        # Agent is about to call a tool
        print(f"\nCalling tool: {event.tool_name}")
    elif isinstance(event, ToolCallResult):
        # Tool returned a result
        print(f"\nTool result: {event.tool_output.content[:100]}")
    elif isinstance(event, AgentOutput):
        # A reasoning step completed
        pass
result = await handler

The key streaming events are:

Event Type When Emitted Key Fields
AgentStream During LLM token generation delta (new token), response (accumulated text), thinking_delta
AgentInput Before each LLM call input (messages), current_agent_name
ToolCall When a tool is about to execute tool_name, tool_kwargs, tool_id
ToolCallResult After a tool completes tool_name, tool_output, return_direct
AgentOutput After each reasoning step response, tool_calls, raw
AgentStreamStructuredOutput When structured output is generated output (dict)

Structured Output

The agent can optionally produce structured output in addition to the text response. This is configured via two mechanisms:

output_cls

A Pydantic BaseModel subclass. After the agent finishes, an additional LLM call generates a JSON response conforming to the schema:

from pydantic import BaseModel

class AnalysisResult(BaseModel):
    summary: str
    confidence: float
    sources: list[str]

agent = ReActAgent(tools=[...], llm=llm, output_cls=AnalysisResult)
result = await agent.run("Analyze the quarterly report.")
structured = result.get_pydantic_model(AnalysisResult)
print(structured.summary)

structured_output_fn

A custom function that takes the full chat history and returns a dictionary. This gives complete control over how structured output is derived:

def extract_output(messages: list[ChatMessage]) -> dict:
    last_msg = messages[-1].content
    return {"answer": last_msg, "num_messages": len(messages)}

agent = ReActAgent(tools=[...], llm=llm, structured_output_fn=extract_output)

Error Handling in Outputs

When processing agent outputs, it is important to check for error conditions:

  • Tool errors -- Check tool_output.is_error on individual tool call results
  • Parse failures -- The agent internally retries, but if all retries fail, the output may contain the raw unparsed text
  • Max iterations -- With early_stopping_method="generate", the response will be a "best effort" synthesis rather than a precise answer
  • Structured output failures -- get_pydantic_model() may return None if validation fails, with a warning emitted

Knowledge Sources

LlamaIndex Agents Documentation LlamaIndex GitHub Repository

Implementation

Implementation:Run_llama_Llama_index_AgentOutput_Processing

2026-02-11 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment