Principle:Openai Openai node Text To Speech
| Knowledge Sources | |
|---|---|
| Domains | Audio, Speech_Synthesis |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
A principle for synthesizing natural-sounding speech from text input using neural text-to-speech models with configurable voice, speed, and output format.
Description
Text-to-Speech (TTS) converts written text into spoken audio. The system takes a text string, a model selection (standard or HD quality), and a voice identity, then produces an audio stream in the requested format. This enables voice interfaces, accessibility features, content narration, and interactive applications.
Key configuration dimensions include voice selection (multiple distinct voices with different characteristics), output format (MP3, Opus, AAC, FLAC, WAV, PCM), speed control (0.25x to 4.0x), and optional style instructions for advanced models.
Usage
Use this principle when your application needs to convert text to audio. Common scenarios include chatbot voice responses, content narration, accessibility features, and interactive voice applications.
Theoretical Basis
TTS follows a Text → Model → Audio Stream pipeline:
function synthesizeSpeech(text, model, voice, format):
// 1. Text normalization and preprocessing
// 2. Neural model generates audio waveform
// 3. Audio encoding in requested format
response = await api.post('/audio/speech', {
input: text, // Max 4096 characters
model: model, // 'tts-1' (fast) or 'tts-1-hd' (quality)
voice: voice, // 'alloy', 'echo', 'nova', etc.
response_format: format, // 'mp3', 'opus', 'wav', etc.
speed: 1.0, // 0.25 to 4.0
})
return response.body // Binary audio stream