Implementation:Microsoft Autogen Studio Generate Image Tool
| Sources | python/packages/autogen-studio/autogenstudio/gallery/tools/generate_image.py |
|---|---|
| Domains | Tools, Image Generation, AI Art, DALL-E, AutoGen Studio |
| Last Updated | 2026-02-11 |
Overview
Description
The Generate Image Tool is an AutoGen Studio utility that enables agents to create images from text descriptions using OpenAI's DALL-E 3 model. The tool provides a simple interface for text-to-image generation with configurable output options for image size and storage location.
This implementation handles the complete workflow from prompt submission to image generation, decoding, and file storage. It uses base64 encoding for image transfer and generates unique filenames to prevent conflicts.
Key Features
- DALL-E 3 integration: Uses OpenAI's latest image generation model
- Text-to-image generation: Creates images from natural language descriptions
- Flexible sizing: Supports three standard image sizes
- Automatic file management: Generates unique filenames and handles saving
- Base64 decoding: Processes API responses in b64_json format
- Configurable output: Option to specify output directory
Usage
The tool requires an OpenAI API key configured in the environment. It generates PNG images and saves them to the specified directory or current working directory.
Environment Setup:
export OPENAI_API_KEY="your_openai_api_key"
Basic Usage:
image_paths = await generate_image(
query="A serene mountain landscape at sunset",
image_size="1024x1024"
)
print(f"Image saved to: {image_paths[0]}")
Code Reference
Source Location
- File:
python/packages/autogen-studio/autogenstudio/gallery/tools/generate_image.py - Repository: https://github.com/microsoft/autogen
- Lines: 67 total
Function Signature
async def generate_image(
query: str,
output_dir: Optional[Path] = None,
image_size: Literal["1024x1024", "512x512", "256x256"] = "1024x1024"
) -> List[str]
Import Statement
from autogenstudio.gallery.tools.generate_image import generate_image, generate_image_tool
Dependencies
- Standard Library: base64, io, uuid, pathlib, typing
- Third-party: openai, Pillow (PIL)
- AutoGen: autogen_core.code_executor, autogen_core.tools
I/O Contract
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| query | str | (required) | Natural language description of the desired image |
| output_dir | Optional[Path] | None | Directory to save generated images (current directory if None) |
| image_size | Literal["1024x1024", "512x512", "256x256"] | "1024x1024" | Size of the generated image in pixels |
Outputs
| Field | Type | Description |
|---|---|---|
| return | List[str] | List of file paths to the generated image files (currently always 1 image) |
Output Details:
- Images are saved as PNG files
- Filenames are UUID-based (e.g., "a1b2c3d4-e5f6-7890-abcd-ef1234567890.png")
- Paths are returned as strings (absolute or relative based on output_dir)
- Currently generates 1 image per call (n=1 in API call)
Exceptions
The function may raise exceptions from the OpenAI API or file I/O operations:
- openai.AuthenticationError: Invalid or missing API key
- openai.RateLimitError: API quota exceeded
- openai.APIError: OpenAI service errors
- IOError: File system errors when saving images
- ValueError: Invalid image_size parameter (enforced by type system)
Implementation Details
Core Algorithm
- Initialization Phase:
- Create OpenAI client instance (uses OPENAI_API_KEY from environment)
- Generation Phase:
- Call OpenAI images.generate API with:
- model="dall-e-3"
- prompt=query
- n=1 (single image)
- response_format="b64_json" (base64-encoded image data)
- size=image_size
- Call OpenAI images.generate API with:
- Processing Phase:
- For each image in response.data:
- Generate unique filename using UUID
- Determine output path (output_dir or current directory)
- Extract base64 JSON data
- Decode base64 to binary image data
- Open image with PIL
- Save image as PNG file
- Add file path to saved_files list
- For each image in response.data:
- Return Phase:
- Return list of saved file paths
Image Format
- API Response Format: base64-encoded JSON
- Decode Method: base64.decodebytes()
- Image Processing: PIL (Pillow) library
- Output Format: PNG (lossless compression)
- Filename Pattern:
{uuid4()}.png
DALL-E 3 Specifics
- Model: dall-e-3 (OpenAI's latest image generation model)
- Generation Count: 1 image per request (n=1)
- Supported Sizes: 1024x1024, 512x512, 256x256 pixels
- Quality: High-quality photorealistic or artistic images
- Style: Determined by prompt content and phrasing
Usage Examples
Example 1: Basic Image Generation
# Generate a single image with default settings
paths = await generate_image(
query="A futuristic city with flying cars at night"
)
print(f"Image generated: {paths[0]}")
# Output: Image generated: a1b2c3d4-e5f6-7890-abcd-ef1234567890.png
Example 2: Custom Size
# Generate a smaller image for faster processing
paths = await generate_image(
query="A cute cartoon cat wearing sunglasses",
image_size="512x512"
)
Example 3: Custom Output Directory
from pathlib import Path
# Save to specific directory
output_path = Path("/tmp/generated_images")
output_path.mkdir(exist_ok=True)
paths = await generate_image(
query="An abstract painting with vibrant colors",
output_dir=output_path
)
print(f"Image saved to: {paths[0]}")
# Output: Image saved to: /tmp/generated_images/a1b2c3d4-....png
Example 4: Detailed Prompt
# Use detailed, specific prompts for better results
detailed_query = """
A professional photograph of a modern minimalist office space.
Natural lighting from large windows. Clean white walls.
Wooden desk with a laptop and potted plant.
Soft shadows and warm tones. 4K quality.
"""
paths = await generate_image(
query=detailed_query,
image_size="1024x1024"
)
Example 5: Using the FunctionTool
from autogenstudio.gallery.tools.generate_image import generate_image_tool
# Add tool to an agent
artist_agent = ConversableAgent(
name="artist",
tools=[generate_image_tool]
)
# Agent can now process instructions like:
# "Generate an image of a sunset over the ocean"
Example 6: Batch Generation
# Generate multiple images with different prompts
prompts = [
"A red sports car",
"A blue vintage bicycle",
"A green motorcycle"
]
image_paths = []
for prompt in prompts:
paths = await generate_image(query=prompt, image_size="512x512")
image_paths.extend(paths)
print(f"Generated {len(image_paths)} images")
Example 7: Error Handling
from openai import OpenAIError
try:
paths = await generate_image(
query="A beautiful landscape",
output_dir=Path("/invalid/path")
)
except OpenAIError as e:
print(f"OpenAI API error: {e}")
except IOError as e:
print(f"File system error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Prompt Engineering Tips
Effective Prompts
- Be specific: Include details about style, lighting, composition, colors
- Use descriptive language: "Professional photograph" vs. "picture"
- Specify quality: "4K", "high resolution", "detailed"
- Include artistic style: "oil painting", "watercolor", "digital art", "photorealistic"
- Describe mood: "serene", "dramatic", "playful", "mysterious"
Example Good Prompts
# Photorealistic "A professional photograph of a golden retriever puppy playing in a field of wildflowers during golden hour, shallow depth of field, 4K quality" # Artistic "An impressionist oil painting of a Parisian café in autumn, warm colors, soft brushstrokes, in the style of Claude Monet" # Technical/Diagrammatic "A clean, minimalist infographic showing the water cycle, flat design, pastel colors, educational illustration"
Prompts to Avoid
- Vague descriptions: "A nice picture"
- Conflicting requirements: "Realistic cartoon"
- Overly complex: Too many competing elements
- Prohibited content: Violence, explicit content, copyrighted characters
Configuration
Environment Variables
| Variable | Required | Description |
|---|---|---|
| OPENAI_API_KEY | Yes | OpenAI API key for authentication |
Size Options
| Size | Dimensions | Use Case |
|---|---|---|
| "1024x1024" | 1024 × 1024 pixels | High quality, detailed images (default) |
| "512x512" | 512 × 512 pixels | Medium quality, faster generation |
| "256x256" | 256 × 256 pixels | Low quality, quickest generation, prototyping |
API Costs
Note: DALL-E 3 API usage incurs costs based on image size:
- 1024x1024: Higher cost per image
- 512x512: Medium cost per image
- 256x256: Lower cost per image
Check OpenAI pricing for current rates.
Related Pages
- Implementation: Studio Bing Search Tool - Search tool for finding reference images
- Implementation: Studio Google Search Tool - Alternative search with image results
- Microsoft Autogen Studio Tools - Overview of all AutoGen Studio gallery tools
- AutoGen Core Function Tools - Documentation on FunctionTool framework
- AI Image Generation Best Practices - Guidelines for effective image generation