Workflow:Openai Openai python Responses API Text Generation

Knowledge Sources	OpenAI Python SDK OpenAI Responses API Reference OpenAI Structured Outputs Guide
Domains	LLMs, Text_Generation, API_Integration
Last Updated	2026-02-15 10:00 GMT

Overview

End-to-end process for generating text responses using the OpenAI Responses API, the primary modern interface for text generation with support for streaming, structured outputs, tool calling, and background execution.

Description

This workflow documents the Responses API, which is the primary and recommended API for interacting with OpenAI models. Unlike the Chat Completions API which uses message arrays, the Responses API accepts a simple input parameter and returns structured output items. It provides built-in support for streaming via the .stream() helper, structured output parsing via .parse(), Pydantic-based tool definitions, background response generation with resumable streams, and input token counting. The Responses API is more streamlined and supports newer features like background execution.

Usage

Execute this workflow when you want to use the modern Responses API for text generation. This is the recommended path for new applications. Use it when you need simple text generation, streamed responses with typed event handling, structured output parsing into Pydantic models, function tool calling, or background response creation that can be resumed after client disconnection.

Execution Steps

Step 1: Client Initialization

Instantiate the OpenAI or AsyncOpenAI client. The client reads the OPENAI_API_KEY from environment variables by default. Configure optional settings like timeout, retries, and HTTP client as needed.

Key considerations:

Use environment variables for API key management
AsyncOpenAI is required for async workflows
The Responses API is accessed via client.responses

Step 2: Define Output Structure (Optional)

If using structured outputs, define a Pydantic model class representing the desired response schema. For tool calling, define Pydantic models for each tool's arguments and convert them to tool definitions using openai.pydantic_function_tool(). These schemas constrain the model's output to valid, parseable structures.

Key considerations:

Pydantic models are automatically converted to strict JSON schemas
Tool definitions support enum types, nested models, and union types
The text_format parameter on .parse() and .stream() sets the output schema

Step 3: Create Response

Call client.responses.create() for basic text generation, .parse() for structured outputs, or .stream() for streaming. Pass the input (string or message list), model name, and optional parameters like instructions, tools, text_format, and background. For background mode, set background=True to allow the response to continue generating after client disconnect.

Key considerations:

.create() returns a complete Response object
.stream() returns a context manager yielding typed streaming events
.parse() returns a response with .parsed attributes on output items
Background mode enables long-running or resumable responses

Step 4: Process Response

For standard responses, access response.output_text for simple text or iterate response.output for structured output items. For streaming, iterate over events and filter by event.type (e.g., response.output_text.delta, response.completed). For parsed responses, access .parsed on text content items or .parsed_arguments on function call items. For background responses, use client.responses.retrieve() with stream=True and starting_after to resume.

Key considerations:

Streaming events are strongly typed (e.g., ResponseTextDeltaEvent)
The .get_final_response() method on streams returns the complete Response after streaming
Background responses can be resumed from a specific sequence number

Step 5: Count Input Tokens (Optional)

Before sending a request, optionally count the input tokens using client.responses.input_tokens.count() with the same parameters you would pass to .create(). This returns the token count without generating a response, useful for cost estimation and context window management.

Key considerations:

Uses the same parameters as .create() for accurate counting
Helps manage context window limits before making requests
Available as a separate endpoint, not part of the generation flow

Execution Diagram

GitHub URL

Workflow Repository