Implementation:Openai Openai node Speech Resource

Knowledge Sources	Openai_Openai_node
Domains	SDK, Audio, Text_to_Speech
Last Updated	2026-02-15 12:00 GMT

Overview

The Speech resource class provides the create method for generating audio from input text via the OpenAI text-to-speech API.

Description

The Speech class extends APIResource and exposes a single create method that sends a POST request to the /audio/speech endpoint. The method accepts a SpeechCreateParams body and returns a binary Response object wrapped in an APIPromise. The response contains raw audio data in the requested format (defaulting to MP3), which can be consumed as a blob, array buffer, or streamed.

The class supports multiple TTS models including tts-1, tts-1-hd, gpt-4o-mini-tts, and gpt-4o-mini-tts-2025-12-15. It offers a selection of built-in voices such as alloy, ash, ballad, coral, echo, sage, shimmer, verse, marin, and cedar. Additional parameters allow customization of output format, playback speed, voice instructions (for newer models), and stream format.

The request sets the Accept: application/octet-stream header and uses the __binaryResponse flag to indicate that the response should be treated as binary data rather than parsed as JSON.

Usage

Use the Speech resource when you need to convert text to spoken audio. Access it via client.audio.speech.create() and provide the input text, a TTS model, and a voice selection. The resulting binary response can be saved to a file or streamed to an audio player.

Code Reference

Source Location

Repository: openai-node
File: src/resources/audio/speech.ts

Signature

export class Speech extends APIResource {
  create(body: SpeechCreateParams, options?: RequestOptions): APIPromise<Response>;
}

export type SpeechModel = 'tts-1' | 'tts-1-hd' | 'gpt-4o-mini-tts' | 'gpt-4o-mini-tts-2025-12-15';

export interface SpeechCreateParams {
  input: string;
  model: (string & {}) | SpeechModel;
  voice: (string & {}) | 'alloy' | 'ash' | 'ballad' | 'coral' | 'echo'
    | 'sage' | 'shimmer' | 'verse' | 'marin' | 'cedar';
  instructions?: string;
  response_format?: 'mp3' | 'opus' | 'aac' | 'flac' | 'wav' | 'pcm';
  speed?: number;
  stream_format?: 'sse' | 'audio';
}

Import

import OpenAI from 'openai';

I/O Contract

Inputs

Name	Type	Required	Description
input	`string`	Yes	The text to generate audio for (max 4096 characters)
model	SpeechModel	Yes	TTS model to use (e.g., `tts-1`, `tts-1-hd`, `gpt-4o-mini-tts`)
voice	VoiceUnion	Yes	The voice to use (e.g., `alloy`, `ash`, `coral`, `shimmer`)
instructions	`string`	No	Additional voice control instructions (not supported for `tts-1` or `tts-1-hd`)
response_format	`string`	No	Audio output format: `mp3`, `opus`, `aac`, `flac`, `wav`, or `pcm`
speed	`number`	No	Playback speed from 0.25 to 4.0 (default 1.0)
stream_format	`string`	No	Stream format: `sse` or `audio`

Outputs

Name	Type	Description
Response	`Response`	A binary HTTP Response object containing the generated audio data

Usage Examples

import OpenAI from 'openai';

const client = new OpenAI();

const speech = await client.audio.speech.create({
  input: 'Today is a wonderful day to build something people love!',
  model: 'tts-1',
  voice: 'alloy',
});

// Get as a Blob
const blob = await speech.blob();
console.log('Audio blob size:', blob.size);

// Or save to file in Node.js
const buffer = Buffer.from(await speech.arrayBuffer());
await fs.promises.writeFile('output.mp3', buffer);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment