Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft Autogen Studio Generate Image Tool

From Leeroopedia
Sources python/packages/autogen-studio/autogenstudio/gallery/tools/generate_image.py
Domains Tools, Image Generation, AI Art, DALL-E, AutoGen Studio
Last Updated 2026-02-11

Overview

Description

The Generate Image Tool is an AutoGen Studio utility that enables agents to create images from text descriptions using OpenAI's DALL-E 3 model. The tool provides a simple interface for text-to-image generation with configurable output options for image size and storage location.

This implementation handles the complete workflow from prompt submission to image generation, decoding, and file storage. It uses base64 encoding for image transfer and generates unique filenames to prevent conflicts.

Key Features

  • DALL-E 3 integration: Uses OpenAI's latest image generation model
  • Text-to-image generation: Creates images from natural language descriptions
  • Flexible sizing: Supports three standard image sizes
  • Automatic file management: Generates unique filenames and handles saving
  • Base64 decoding: Processes API responses in b64_json format
  • Configurable output: Option to specify output directory

Usage

The tool requires an OpenAI API key configured in the environment. It generates PNG images and saves them to the specified directory or current working directory.

Environment Setup:

export OPENAI_API_KEY="your_openai_api_key"

Basic Usage:

image_paths = await generate_image(
    query="A serene mountain landscape at sunset",
    image_size="1024x1024"
)
print(f"Image saved to: {image_paths[0]}")

Code Reference

Source Location

Function Signature

async def generate_image(
    query: str,
    output_dir: Optional[Path] = None,
    image_size: Literal["1024x1024", "512x512", "256x256"] = "1024x1024"
) -> List[str]

Import Statement

from autogenstudio.gallery.tools.generate_image import generate_image, generate_image_tool

Dependencies

  • Standard Library: base64, io, uuid, pathlib, typing
  • Third-party: openai, Pillow (PIL)
  • AutoGen: autogen_core.code_executor, autogen_core.tools

I/O Contract

Inputs

Parameter Type Default Description
query str (required) Natural language description of the desired image
output_dir Optional[Path] None Directory to save generated images (current directory if None)
image_size Literal["1024x1024", "512x512", "256x256"] "1024x1024" Size of the generated image in pixels

Outputs

Field Type Description
return List[str] List of file paths to the generated image files (currently always 1 image)

Output Details:

  • Images are saved as PNG files
  • Filenames are UUID-based (e.g., "a1b2c3d4-e5f6-7890-abcd-ef1234567890.png")
  • Paths are returned as strings (absolute or relative based on output_dir)
  • Currently generates 1 image per call (n=1 in API call)

Exceptions

The function may raise exceptions from the OpenAI API or file I/O operations:

  • openai.AuthenticationError: Invalid or missing API key
  • openai.RateLimitError: API quota exceeded
  • openai.APIError: OpenAI service errors
  • IOError: File system errors when saving images
  • ValueError: Invalid image_size parameter (enforced by type system)

Implementation Details

Core Algorithm

  1. Initialization Phase:
    1. Create OpenAI client instance (uses OPENAI_API_KEY from environment)
  2. Generation Phase:
    1. Call OpenAI images.generate API with:
      1. model="dall-e-3"
      2. prompt=query
      3. n=1 (single image)
      4. response_format="b64_json" (base64-encoded image data)
      5. size=image_size
  3. Processing Phase:
    1. For each image in response.data:
      1. Generate unique filename using UUID
      2. Determine output path (output_dir or current directory)
      3. Extract base64 JSON data
      4. Decode base64 to binary image data
      5. Open image with PIL
      6. Save image as PNG file
      7. Add file path to saved_files list
  4. Return Phase:
    1. Return list of saved file paths

Image Format

  • API Response Format: base64-encoded JSON
  • Decode Method: base64.decodebytes()
  • Image Processing: PIL (Pillow) library
  • Output Format: PNG (lossless compression)
  • Filename Pattern: {uuid4()}.png

DALL-E 3 Specifics

  • Model: dall-e-3 (OpenAI's latest image generation model)
  • Generation Count: 1 image per request (n=1)
  • Supported Sizes: 1024x1024, 512x512, 256x256 pixels
  • Quality: High-quality photorealistic or artistic images
  • Style: Determined by prompt content and phrasing

Usage Examples

Example 1: Basic Image Generation

# Generate a single image with default settings
paths = await generate_image(
    query="A futuristic city with flying cars at night"
)

print(f"Image generated: {paths[0]}")
# Output: Image generated: a1b2c3d4-e5f6-7890-abcd-ef1234567890.png

Example 2: Custom Size

# Generate a smaller image for faster processing
paths = await generate_image(
    query="A cute cartoon cat wearing sunglasses",
    image_size="512x512"
)

Example 3: Custom Output Directory

from pathlib import Path

# Save to specific directory
output_path = Path("/tmp/generated_images")
output_path.mkdir(exist_ok=True)

paths = await generate_image(
    query="An abstract painting with vibrant colors",
    output_dir=output_path
)

print(f"Image saved to: {paths[0]}")
# Output: Image saved to: /tmp/generated_images/a1b2c3d4-....png

Example 4: Detailed Prompt

# Use detailed, specific prompts for better results
detailed_query = """
A professional photograph of a modern minimalist office space.
Natural lighting from large windows. Clean white walls.
Wooden desk with a laptop and potted plant.
Soft shadows and warm tones. 4K quality.
"""

paths = await generate_image(
    query=detailed_query,
    image_size="1024x1024"
)

Example 5: Using the FunctionTool

from autogenstudio.gallery.tools.generate_image import generate_image_tool

# Add tool to an agent
artist_agent = ConversableAgent(
    name="artist",
    tools=[generate_image_tool]
)

# Agent can now process instructions like:
# "Generate an image of a sunset over the ocean"

Example 6: Batch Generation

# Generate multiple images with different prompts
prompts = [
    "A red sports car",
    "A blue vintage bicycle",
    "A green motorcycle"
]

image_paths = []
for prompt in prompts:
    paths = await generate_image(query=prompt, image_size="512x512")
    image_paths.extend(paths)

print(f"Generated {len(image_paths)} images")

Example 7: Error Handling

from openai import OpenAIError

try:
    paths = await generate_image(
        query="A beautiful landscape",
        output_dir=Path("/invalid/path")
    )
except OpenAIError as e:
    print(f"OpenAI API error: {e}")
except IOError as e:
    print(f"File system error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Prompt Engineering Tips

Effective Prompts

  • Be specific: Include details about style, lighting, composition, colors
  • Use descriptive language: "Professional photograph" vs. "picture"
  • Specify quality: "4K", "high resolution", "detailed"
  • Include artistic style: "oil painting", "watercolor", "digital art", "photorealistic"
  • Describe mood: "serene", "dramatic", "playful", "mysterious"

Example Good Prompts

# Photorealistic
"A professional photograph of a golden retriever puppy playing in a field
of wildflowers during golden hour, shallow depth of field, 4K quality"

# Artistic
"An impressionist oil painting of a Parisian café in autumn, warm colors,
soft brushstrokes, in the style of Claude Monet"

# Technical/Diagrammatic
"A clean, minimalist infographic showing the water cycle, flat design,
pastel colors, educational illustration"

Prompts to Avoid

  • Vague descriptions: "A nice picture"
  • Conflicting requirements: "Realistic cartoon"
  • Overly complex: Too many competing elements
  • Prohibited content: Violence, explicit content, copyrighted characters

Configuration

Environment Variables

Variable Required Description
OPENAI_API_KEY Yes OpenAI API key for authentication

Size Options

Size Dimensions Use Case
"1024x1024" 1024 × 1024 pixels High quality, detailed images (default)
"512x512" 512 × 512 pixels Medium quality, faster generation
"256x256" 256 × 256 pixels Low quality, quickest generation, prototyping

API Costs

Note: DALL-E 3 API usage incurs costs based on image size:

  • 1024x1024: Higher cost per image
  • 512x512: Medium cost per image
  • 256x256: Lower cost per image

Check OpenAI pricing for current rates.

Related Pages

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment