Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Openai Openai agents python Model Execution

From Leeroopedia
Property Value
Principle Name Model Execution
SDK OpenAI Agents Python
Repository openai-agents-python
Source Reference src/agents/models/openai_responses.py:L82-157
Import from agents.models.openai_responses import OpenAIResponsesModel

Overview

The Model Execution principle describes the abstraction layer for LLM invocation within the OpenAI Agents Python SDK. This layer is responsible for translating the agent's configuration (instructions, tools, output schema, handoffs) into an actual API call to a language model, and converting the raw API response into a standardized ModelResponse that the run loop can process.

Description

The model execution layer sits between the Runner (orchestration) and the external LLM API. Its responsibilities include:

  • Input Assembly: Combining system instructions, conversation history, tool schemas, output schema constraints, and handoff definitions into the format expected by the model provider's API.
  • API Invocation: Making the actual HTTP call to the LLM provider (e.g., OpenAI Responses API, Chat Completions API, or third-party providers via LiteLLM).
  • Response Normalization: Converting the provider-specific response format into a standardized ModelResponse containing output items, usage statistics, and a response ID.
  • Tracing Integration: Recording model inputs, outputs, and errors into the tracing system for observability and debugging.

Provider Abstraction

The SDK defines a Model abstract interface that any model backend must implement. The two primary methods are:

  1. get_response(): Non-streaming invocation that returns a complete ModelResponse.
  2. stream_response(): Streaming invocation that yields response chunks incrementally.

This abstraction allows the SDK to support multiple backends:

  • OpenAIResponsesModel: Uses the OpenAI Responses API (the default and recommended backend).
  • OpenAIChatCompletionsModel: Uses the OpenAI Chat Completions API for broader compatibility.
  • LitellmModel: Uses the LiteLLM library for access to 100+ LLM providers.

Input Construction

Before calling the model, the run loop constructs the full input payload:

  1. System instructions: Derived from the agent's instructions field (static or dynamically generated).
  2. Conversation history: The accumulated input items from the current run, including user messages, assistant messages, tool call results, and handoff messages.
  3. Tool schemas: JSON schemas for each tool the agent has access to, enabling the model to generate structured tool calls.
  4. Output schema: If the agent defines an output_type, the corresponding JSON schema is passed to constrain the model's final output format.
  5. Handoff definitions: Handoff targets are exposed to the model as special transfer tools.

Response Processing

The ModelResponse returned by the model layer contains:

  • Output items: A list of response output items (text messages, tool calls, handoff requests) that the run loop iterates over.
  • Usage statistics: Token counts (input, output, total) and detailed breakdowns for monitoring and cost tracking.
  • Response ID: A unique identifier for the response, used for server-managed conversation continuity.

Tracing and Observability

The model execution layer integrates with the SDK's tracing system. Each model call is wrapped in a response span that records:

  • The input sent to the model (if tracing data inclusion is enabled).
  • The response received from the model.
  • Any errors that occurred during the call.

This enables end-to-end observability of agent runs, from the initial user input through each model invocation to the final output.

Theoretical Basis

The Model Execution principle draws from:

  • Adapter Pattern: Each model implementation (OpenAI Responses, Chat Completions, LiteLLM) adapts a specific API to the common Model interface.
  • Facade Pattern: The get_response() method provides a simplified interface that hides the complexity of API parameter construction, error handling, and response parsing.
  • Dependency Inversion: The run loop depends on the abstract Model interface rather than any specific API client, making it easy to swap backends.
  • Decorator Pattern: Tracing wraps the model call in a span decorator without modifying the core invocation logic.

Usage

The model execution layer is internal to the SDK and is not typically called directly by users. The Runner invokes it automatically. However, understanding the model layer is important for:

  • Implementing custom model providers.
  • Debugging model invocation issues.
  • Configuring model-specific settings via ModelSettings.

Internal Call Flow

# This is what happens internally when Runner.run() is called:
# 1. Runner assembles system_instructions from agent.instructions
# 2. Runner collects tools, output_schema, and handoffs from the agent
# 3. Runner calls model.get_response() with all assembled parameters
# 4. The ModelResponse is processed by the run loop

model_response = await model.get_response(
    system_instructions="You are a helpful assistant.",
    input=[{"role": "user", "content": "Hello"}],
    model_settings=ModelSettings(temperature=0.7),
    tools=[my_function_tool],
    output_schema=None,
    handoffs=[],
    tracing=model_tracing,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment