Principle:Openai Openai agents python Streamed Run Invocation

Overview

Streamed Run Invocation is the principle of executing an agent with real-time streaming of responses. Rather than waiting for an entire agent run to complete before receiving output, streaming allows incremental delivery of text deltas, tool calls, and run items as they are generated. This enables responsive user interfaces that can display partial results immediately.

The OpenAI Agents Python SDK provides this capability through the Runner.run_streamed() class method, which shares the same core run loop logic as the non-streamed Runner.run() but returns results incrementally.

Core Theory

Immediate Return with Background Processing

When Runner.run_streamed() is called, it returns a RunResultStreaming object immediately, without blocking. The actual agent processing -- including LLM inference, tool execution, and handoff resolution -- happens in the background. The caller can then consume events from the result object as they arrive.

This design separates the initiation of a run from the consumption of its output, providing a natural fit for asynchronous and event-driven architectures.

Three Event Types

The streaming interface delivers three distinct types of events through its stream_events() async iterator:

RawResponsesStreamEvent -- Delivers raw LLM chunks as they arrive from the model. These include text deltas (partial text tokens), streaming tool call arguments, and other low-level response data. This event type is essential for real-time text rendering in UIs.

RunItemStreamEvent -- Wraps higher-level semantic items that represent meaningful milestones in the agent run. These include events such as message_output_created, tool_called, tool_output, and handoff_requested. These events allow consumers to react to structured agent actions.

AgentUpdatedStreamEvent -- Signals when a handoff changes the currently active agent. This notifies consumers that a different agent has taken over processing, which is important for multi-agent workflows.

The stream_events() Async Iterator

The stream_events() method on RunResultStreaming returns an async iterator that yields events as they occur. Consuming all events from this iterator is what drives the run to completion. The pattern is:

result = Runner.run_streamed(agent, input_text)

async for event in result.stream_events():
    # Process each event as it arrives
    handle_event(event)

# After the loop, the run is complete
print(result.final_output)

Final Result Availability

After the stream completes (all events have been consumed), the RunResultStreaming object provides access to the same final result data as a non-streamed run:

final_output -- The final text or structured output from the agent.
new_items -- A list of all run items produced during execution.
raw_responses -- The complete raw LLM responses collected during the run.

This means that after streaming, the result can be used identically to a result from Runner.run().

Behavioral Parity with Non-Streamed Runs

A critical design principle is that the streaming and non-streaming execution paths share core run loop logic. This ensures behavioral parity: the same agent configuration, tools, guardrails, and handoff logic produce the same results regardless of whether streaming is used. The only difference is how the results are delivered -- all at once or incrementally.

This parity simplifies reasoning about agent behavior and ensures that switching between streamed and non-streamed execution is a presentation concern, not a behavioral one.

Source Reference

Source: src/agents/run.py, lines 312-382 (Runner.run_streamed)
Import: from agents import Runner

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment