Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Openai agents python Image Tool Output Pattern

From Leeroopedia
Knowledge Sources
Domains Tool_Integration, Multimodal, Design_Pattern
Last Updated 2026-02-11 00:00 GMT

Overview

Demonstrates returning image data from a function tool using ToolOutputImage or ToolOutputImageDict, allowing the model to receive and describe images fetched by tools.

Description

The image_tool_output.py example showcases the SDK's support for multimodal tool outputs. When a function tool needs to return an image to the model (rather than plain text), it can use either ToolOutputImage (a typed class) or ToolOutputImageDict (a typed dictionary) to wrap the image URL along with a detail level parameter. The SDK automatically converts this output into the appropriate format for the model to process as visual input.

The example defines a fetch_random_image function tool decorated with @function_tool that returns either a ToolOutputImageDict (a dictionary with "type", "image_url", and "detail" keys) or a ToolOutputImage instance depending on a toggle flag. Both approaches produce the same result: the image URL is sent to the model, which can then describe or reason about the image content. The example uses a sample Unsplash image URL of a London cityscape.

This pattern is essential for building agents that interact with visual content, such as image analysis tools, screenshot-based workflows, or any scenario where a tool fetches or generates images that the model needs to interpret.

Usage

Use this pattern when building function tools that need to return images to the model for further processing. This is useful for image retrieval tools, screenshot capture tools, chart/graph generation, or any workflow where the model needs to see and describe visual output from a tool invocation.

Code Reference

Source Location

Signature

@function_tool
def fetch_random_image() -> ToolOutputImage | ToolOutputImageDict:
    """Fetch a random image."""
    return ToolOutputImage(image_url=URL, detail="auto")
    # or: return {"type": "image", "image_url": URL, "detail": "auto"}

Import

from agents import Agent, Runner, ToolOutputImage, ToolOutputImageDict, function_tool

I/O Contract

Inputs

Name Type Required Description
(no parameters) -- -- The fetch_random_image tool takes no arguments in this example

Outputs

Name Type Description
ToolOutputImage ToolOutputImage A typed object containing image_url (str) and detail ("auto", "low", or "high")
ToolOutputImageDict dict A TypedDict with keys "type" ("image"), "image_url" (str), and "detail" (str)
result.final_output str The model's text description of the fetched image

Usage Examples

Return Image from a Function Tool (Typed Class)

from agents import Agent, Runner, ToolOutputImage, function_tool

URL = "https://images.unsplash.com/photo-1505761671935-60b3a7427bad?auto=format&fit=crop&w=400&q=80"

@function_tool
def fetch_random_image() -> ToolOutputImage:
    """Fetch a random image."""
    return ToolOutputImage(image_url=URL, detail="auto")

agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant.",
    tools=[fetch_random_image],
)

Return Image from a Function Tool (TypedDict)

from agents import Agent, Runner, ToolOutputImageDict, function_tool

@function_tool
def fetch_random_image() -> ToolOutputImageDict:
    """Fetch a random image."""
    return {"type": "image", "image_url": URL, "detail": "auto"}

Run the Agent to Fetch and Describe an Image

import asyncio

async def main():
    result = await Runner.run(
        agent,
        input="Fetch an image using the random_image tool, then describe it",
    )
    print(result.final_output)
    # Output: "This image features the famous clock tower, commonly known as Big Ben, ..."

asyncio.run(main())

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment