Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Openai node Speech Resource

From Leeroopedia
Knowledge Sources
Domains SDK, Audio, Text_to_Speech
Last Updated 2026-02-15 12:00 GMT

Overview

The Speech resource class provides the create method for generating audio from input text via the OpenAI text-to-speech API.

Description

The Speech class extends APIResource and exposes a single create method that sends a POST request to the /audio/speech endpoint. The method accepts a SpeechCreateParams body and returns a binary Response object wrapped in an APIPromise. The response contains raw audio data in the requested format (defaulting to MP3), which can be consumed as a blob, array buffer, or streamed.

The class supports multiple TTS models including tts-1, tts-1-hd, gpt-4o-mini-tts, and gpt-4o-mini-tts-2025-12-15. It offers a selection of built-in voices such as alloy, ash, ballad, coral, echo, sage, shimmer, verse, marin, and cedar. Additional parameters allow customization of output format, playback speed, voice instructions (for newer models), and stream format.

The request sets the Accept: application/octet-stream header and uses the __binaryResponse flag to indicate that the response should be treated as binary data rather than parsed as JSON.

Usage

Use the Speech resource when you need to convert text to spoken audio. Access it via client.audio.speech.create() and provide the input text, a TTS model, and a voice selection. The resulting binary response can be saved to a file or streamed to an audio player.

Code Reference

Source Location

Signature

export class Speech extends APIResource {
  create(body: SpeechCreateParams, options?: RequestOptions): APIPromise<Response>;
}

export type SpeechModel = 'tts-1' | 'tts-1-hd' | 'gpt-4o-mini-tts' | 'gpt-4o-mini-tts-2025-12-15';

export interface SpeechCreateParams {
  input: string;
  model: (string & {}) | SpeechModel;
  voice: (string & {}) | 'alloy' | 'ash' | 'ballad' | 'coral' | 'echo'
    | 'sage' | 'shimmer' | 'verse' | 'marin' | 'cedar';
  instructions?: string;
  response_format?: 'mp3' | 'opus' | 'aac' | 'flac' | 'wav' | 'pcm';
  speed?: number;
  stream_format?: 'sse' | 'audio';
}

Import

import OpenAI from 'openai';

I/O Contract

Inputs

Name Type Required Description
input string Yes The text to generate audio for (max 4096 characters)
model SpeechModel Yes TTS model to use (e.g., tts-1, tts-1-hd, gpt-4o-mini-tts)
voice VoiceUnion Yes The voice to use (e.g., alloy, ash, coral, shimmer)
instructions string No Additional voice control instructions (not supported for tts-1 or tts-1-hd)
response_format string No Audio output format: mp3, opus, aac, flac, wav, or pcm
speed number No Playback speed from 0.25 to 4.0 (default 1.0)
stream_format string No Stream format: sse or audio

Outputs

Name Type Description
Response Response A binary HTTP Response object containing the generated audio data

Usage Examples

import OpenAI from 'openai';

const client = new OpenAI();

const speech = await client.audio.speech.create({
  input: 'Today is a wonderful day to build something people love!',
  model: 'tts-1',
  voice: 'alloy',
});

// Get as a Blob
const blob = await speech.blob();
console.log('Audio blob size:', blob.size);

// Or save to file in Node.js
const buffer = Buffer.from(await speech.arrayBuffer());
await fs.promises.writeFile('output.mp3', buffer);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment