Principle:Openai Openai agents python Model Execution
| Property | Value |
|---|---|
| Principle Name | Model Execution |
| SDK | OpenAI Agents Python |
| Repository | openai-agents-python |
| Source Reference | src/agents/models/openai_responses.py:L82-157
|
| Import | from agents.models.openai_responses import OpenAIResponsesModel
|
Overview
The Model Execution principle describes the abstraction layer for LLM invocation within the OpenAI Agents Python SDK. This layer is responsible for translating the agent's configuration (instructions, tools, output schema, handoffs) into an actual API call to a language model, and converting the raw API response into a standardized ModelResponse that the run loop can process.
Description
The model execution layer sits between the Runner (orchestration) and the external LLM API. Its responsibilities include:
- Input Assembly: Combining system instructions, conversation history, tool schemas, output schema constraints, and handoff definitions into the format expected by the model provider's API.
- API Invocation: Making the actual HTTP call to the LLM provider (e.g., OpenAI Responses API, Chat Completions API, or third-party providers via LiteLLM).
- Response Normalization: Converting the provider-specific response format into a standardized
ModelResponsecontaining output items, usage statistics, and a response ID. - Tracing Integration: Recording model inputs, outputs, and errors into the tracing system for observability and debugging.
Provider Abstraction
The SDK defines a Model abstract interface that any model backend must implement. The two primary methods are:
get_response(): Non-streaming invocation that returns a completeModelResponse.stream_response(): Streaming invocation that yields response chunks incrementally.
This abstraction allows the SDK to support multiple backends:
- OpenAIResponsesModel: Uses the OpenAI Responses API (the default and recommended backend).
- OpenAIChatCompletionsModel: Uses the OpenAI Chat Completions API for broader compatibility.
- LitellmModel: Uses the LiteLLM library for access to 100+ LLM providers.
Input Construction
Before calling the model, the run loop constructs the full input payload:
- System instructions: Derived from the agent's
instructionsfield (static or dynamically generated). - Conversation history: The accumulated input items from the current run, including user messages, assistant messages, tool call results, and handoff messages.
- Tool schemas: JSON schemas for each tool the agent has access to, enabling the model to generate structured tool calls.
- Output schema: If the agent defines an
output_type, the corresponding JSON schema is passed to constrain the model's final output format. - Handoff definitions: Handoff targets are exposed to the model as special transfer tools.
Response Processing
The ModelResponse returned by the model layer contains:
- Output items: A list of response output items (text messages, tool calls, handoff requests) that the run loop iterates over.
- Usage statistics: Token counts (input, output, total) and detailed breakdowns for monitoring and cost tracking.
- Response ID: A unique identifier for the response, used for server-managed conversation continuity.
Tracing and Observability
The model execution layer integrates with the SDK's tracing system. Each model call is wrapped in a response span that records:
- The input sent to the model (if tracing data inclusion is enabled).
- The response received from the model.
- Any errors that occurred during the call.
This enables end-to-end observability of agent runs, from the initial user input through each model invocation to the final output.
Theoretical Basis
The Model Execution principle draws from:
- Adapter Pattern: Each model implementation (OpenAI Responses, Chat Completions, LiteLLM) adapts a specific API to the common
Modelinterface. - Facade Pattern: The
get_response()method provides a simplified interface that hides the complexity of API parameter construction, error handling, and response parsing. - Dependency Inversion: The run loop depends on the abstract
Modelinterface rather than any specific API client, making it easy to swap backends. - Decorator Pattern: Tracing wraps the model call in a span decorator without modifying the core invocation logic.
Usage
The model execution layer is internal to the SDK and is not typically called directly by users. The Runner invokes it automatically. However, understanding the model layer is important for:
- Implementing custom model providers.
- Debugging model invocation issues.
- Configuring model-specific settings via
ModelSettings.
Internal Call Flow
# This is what happens internally when Runner.run() is called:
# 1. Runner assembles system_instructions from agent.instructions
# 2. Runner collects tools, output_schema, and handoffs from the agent
# 3. Runner calls model.get_response() with all assembled parameters
# 4. The ModelResponse is processed by the run loop
model_response = await model.get_response(
system_instructions="You are a helpful assistant.",
input=[{"role": "user", "content": "Hello"}],
model_settings=ModelSettings(temperature=0.7),
tools=[my_function_tool],
output_schema=None,
handoffs=[],
tracing=model_tracing,
)