Principle:Togethercomputer Together python Image Prompt Construction
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Image_Generation, API_Client |
| Last Updated | 2026-02-15 16:00 GMT |
Overview
A pattern for constructing text prompts and negative prompts to guide image generation models hosted on Together AI.
Description
Image Prompt Construction involves crafting text descriptions that direct diffusion models to generate desired images. The primary prompt describes what to generate, while the optional negative prompt specifies what to avoid. Prompt quality significantly affects generation results.
In the Together Python SDK, image prompts are passed as plain strings to the client.images.generate() method. The prompt parameter is required and describes the desired image content. The negative_prompt parameter is optional and instructs the model to steer away from unwanted visual elements, styles, or artifacts.
Effective prompts share several characteristics:
- Specificity: Precise descriptions of subjects, settings, and composition yield better results than vague instructions.
- Style guidance: Including artistic style references (e.g., "oil painting", "photorealistic", "digital art") helps direct the visual output.
- Detail ordering: Placing the most important elements early in the prompt gives them stronger influence on the generation.
- Negative prompt utilization: Listing common artifacts or unwanted elements (e.g., "blurry", "low quality", "extra limbs") in the negative prompt improves output quality.
Usage
Use this principle when preparing inputs for the Together AI image generation API. It applies whenever you need to formulate prompts for text-to-image generation via client.images.generate(). Effective prompt construction is essential regardless of which diffusion model is selected.
Theoretical Basis
Text-to-image diffusion models use text embeddings from prompts to condition the denoising process. The workflow operates as follows:
- The text prompt is fed into a text encoder (typically a CLIP model) which produces a vector embedding that captures the semantic content of the description.
- During the iterative denoising process, the model uses this embedding to guide noise removal toward an image that matches the text description.
- The negative prompt, when provided, is similarly encoded into an embedding. During generation, the model's output is steered away from this negative embedding using classifier-free guidance, effectively suppressing the unwanted visual concepts.
- The strength of guidance is controlled by an internal classifier-free guidance scale: higher values enforce closer adherence to the prompt but may reduce diversity.
Pseudo-code:
function constructImagePrompt(description, unwanted_elements):
prompt = describe_subject + describe_style + describe_details
negative_prompt = join(unwanted_elements, ", ")
return prompt, negative_prompt
// Example:
prompt = "A serene mountain lake at sunset, photorealistic, 8k resolution"
negative_prompt = "blurry, low quality, watermark, text"