Implementation:Microsoft Autogen Studio Generate Image Tool

Sources	python/packages/autogen-studio/autogenstudio/gallery/tools/generate_image.py
Domains	Tools, Image Generation, AI Art, DALL-E, AutoGen Studio
Last Updated	2026-02-11

Overview

Description

The Generate Image Tool is an AutoGen Studio utility that enables agents to create images from text descriptions using OpenAI's DALL-E 3 model. The tool provides a simple interface for text-to-image generation with configurable output options for image size and storage location.

This implementation handles the complete workflow from prompt submission to image generation, decoding, and file storage. It uses base64 encoding for image transfer and generates unique filenames to prevent conflicts.

Key Features

DALL-E 3 integration: Uses OpenAI's latest image generation model
Text-to-image generation: Creates images from natural language descriptions
Flexible sizing: Supports three standard image sizes
Automatic file management: Generates unique filenames and handles saving
Base64 decoding: Processes API responses in b64_json format
Configurable output: Option to specify output directory

Usage

The tool requires an OpenAI API key configured in the environment. It generates PNG images and saves them to the specified directory or current working directory.

Environment Setup:

export OPENAI_API_KEY="your_openai_api_key"

Basic Usage:

image_paths = await generate_image(
    query="A serene mountain landscape at sunset",
    image_size="1024x1024"
)
print(f"Image saved to: {image_paths[0]}")

Code Reference

Source Location

File: python/packages/autogen-studio/autogenstudio/gallery/tools/generate_image.py
Repository: https://github.com/microsoft/autogen
Lines: 67 total

Function Signature

async def generate_image(
    query: str,
    output_dir: Optional[Path] = None,
    image_size: Literal["1024x1024", "512x512", "256x256"] = "1024x1024"
) -> List[str]

Import Statement

from autogenstudio.gallery.tools.generate_image import generate_image, generate_image_tool

Dependencies

Standard Library: base64, io, uuid, pathlib, typing
Third-party: openai, Pillow (PIL)
AutoGen: autogen_core.code_executor, autogen_core.tools

I/O Contract

Inputs

Parameter	Type	Default	Description
query	str	(required)	Natural language description of the desired image
output_dir	Optional[Path]	None	Directory to save generated images (current directory if None)
image_size	Literal["1024x1024", "512x512", "256x256"]	"1024x1024"	Size of the generated image in pixels

Outputs

Field	Type	Description
return	List[str]	List of file paths to the generated image files (currently always 1 image)

Output Details:

Images are saved as PNG files
Filenames are UUID-based (e.g., "a1b2c3d4-e5f6-7890-abcd-ef1234567890.png")
Paths are returned as strings (absolute or relative based on output_dir)
Currently generates 1 image per call (n=1 in API call)

Exceptions

The function may raise exceptions from the OpenAI API or file I/O operations:

openai.AuthenticationError: Invalid or missing API key
openai.RateLimitError: API quota exceeded
openai.APIError: OpenAI service errors
IOError: File system errors when saving images
ValueError: Invalid image_size parameter (enforced by type system)

Implementation Details

Core Algorithm

Initialization Phase:
1. Create OpenAI client instance (uses OPENAI_API_KEY from environment)
Generation Phase:
1. Call OpenAI images.generate API with:
  1. model="dall-e-3"
  2. prompt=query
  3. n=1 (single image)
  4. response_format="b64_json" (base64-encoded image data)
  5. size=image_size
Processing Phase:
1. For each image in response.data:
  1. Generate unique filename using UUID
  2. Determine output path (output_dir or current directory)
  3. Extract base64 JSON data
  4. Decode base64 to binary image data
  5. Open image with PIL
  6. Save image as PNG file
  7. Add file path to saved_files list
Return Phase:
1. Return list of saved file paths

Image Format

API Response Format: base64-encoded JSON
Decode Method: base64.decodebytes()
Image Processing: PIL (Pillow) library
Output Format: PNG (lossless compression)
Filename Pattern: {uuid4()}.png

DALL-E 3 Specifics

Model: dall-e-3 (OpenAI's latest image generation model)
Generation Count: 1 image per request (n=1)
Supported Sizes: 1024x1024, 512x512, 256x256 pixels
Quality: High-quality photorealistic or artistic images
Style: Determined by prompt content and phrasing

Usage Examples

Example 1: Basic Image Generation

# Generate a single image with default settings
paths = await generate_image(
    query="A futuristic city with flying cars at night"
)

print(f"Image generated: {paths[0]}")
# Output: Image generated: a1b2c3d4-e5f6-7890-abcd-ef1234567890.png

Example 2: Custom Size

# Generate a smaller image for faster processing
paths = await generate_image(
    query="A cute cartoon cat wearing sunglasses",
    image_size="512x512"
)

Example 3: Custom Output Directory

from pathlib import Path

# Save to specific directory
output_path = Path("/tmp/generated_images")
output_path.mkdir(exist_ok=True)

paths = await generate_image(
    query="An abstract painting with vibrant colors",
    output_dir=output_path
)

print(f"Image saved to: {paths[0]}")
# Output: Image saved to: /tmp/generated_images/a1b2c3d4-....png

Example 4: Detailed Prompt

# Use detailed, specific prompts for better results
detailed_query = """
A professional photograph of a modern minimalist office space.
Natural lighting from large windows. Clean white walls.
Wooden desk with a laptop and potted plant.
Soft shadows and warm tones. 4K quality.
"""

paths = await generate_image(
    query=detailed_query,
    image_size="1024x1024"
)

Example 5: Using the FunctionTool

from autogenstudio.gallery.tools.generate_image import generate_image_tool

# Add tool to an agent
artist_agent = ConversableAgent(
    name="artist",
    tools=[generate_image_tool]
)

# Agent can now process instructions like:
# "Generate an image of a sunset over the ocean"

Example 6: Batch Generation

# Generate multiple images with different prompts
prompts = [
    "A red sports car",
    "A blue vintage bicycle",
    "A green motorcycle"
]

image_paths = []
for prompt in prompts:
    paths = await generate_image(query=prompt, image_size="512x512")
    image_paths.extend(paths)

print(f"Generated {len(image_paths)} images")

Example 7: Error Handling

from openai import OpenAIError

try:
    paths = await generate_image(
        query="A beautiful landscape",
        output_dir=Path("/invalid/path")
    )
except OpenAIError as e:
    print(f"OpenAI API error: {e}")
except IOError as e:
    print(f"File system error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Prompt Engineering Tips

Effective Prompts

Be specific: Include details about style, lighting, composition, colors
Use descriptive language: "Professional photograph" vs. "picture"
Specify quality: "4K", "high resolution", "detailed"
Include artistic style: "oil painting", "watercolor", "digital art", "photorealistic"
Describe mood: "serene", "dramatic", "playful", "mysterious"

Example Good Prompts

# Photorealistic
"A professional photograph of a golden retriever puppy playing in a field
of wildflowers during golden hour, shallow depth of field, 4K quality"

# Artistic
"An impressionist oil painting of a Parisian café in autumn, warm colors,
soft brushstrokes, in the style of Claude Monet"

# Technical/Diagrammatic
"A clean, minimalist infographic showing the water cycle, flat design,
pastel colors, educational illustration"

Prompts to Avoid

Vague descriptions: "A nice picture"
Conflicting requirements: "Realistic cartoon"
Overly complex: Too many competing elements
Prohibited content: Violence, explicit content, copyrighted characters

Configuration

Environment Variables

Variable	Required	Description
OPENAI_API_KEY	Yes	OpenAI API key for authentication

Size Options

Size	Dimensions	Use Case
"1024x1024"	1024 × 1024 pixels	High quality, detailed images (default)
"512x512"	512 × 512 pixels	Medium quality, faster generation
"256x256"	256 × 256 pixels	Low quality, quickest generation, prototyping

API Costs

Note: DALL-E 3 API usage incurs costs based on image size:

1024x1024: Higher cost per image
512x512: Medium cost per image
256x256: Lower cost per image

Check OpenAI pricing for current rates.

Related Pages

Implementation: Studio Bing Search Tool - Search tool for finding reference images
Implementation: Studio Google Search Tool - Alternative search with image results
Microsoft Autogen Studio Tools - Overview of all AutoGen Studio gallery tools
AutoGen Core Function Tools - Documentation on FunctionTool framework
AI Image Generation Best Practices - Guidelines for effective image generation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment