Heuristic:Elevenlabs Elevenlabs python VAD vs Manual Commit Strategy
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Realtime_STT |
| Last Updated | 2026-02-15 12:00 GMT |
Overview
Decision framework for choosing between VAD (automatic) and MANUAL commit strategies in real-time speech-to-text transcription.
Description
The ElevenLabs Scribe Realtime API supports two commit strategies that control when partial transcriptions become final committed transcripts. VAD (Voice Activity Detection) automatically commits when it detects the speaker has stopped talking. MANUAL requires the client to explicitly call `commit()` to finalize a segment. The default is MANUAL, putting full control in the client's hands.
Usage
Use this heuristic when configuring real-time speech-to-text via `ScribeRealtime.connect()`. Choose between VAD and MANUAL based on your application's requirements for automation vs. control over segment boundaries.
The Insight (Rule of Thumb)
- VAD (`CommitStrategy.VAD`): Use when you want automatic segmentation based on speech pauses.
- Configure `vad_silence_threshold_secs` (0.3-3.0s) to control how long a pause triggers a commit.
- Configure `vad_threshold` (0.1-0.9) to adjust sensitivity to speech vs. noise.
- Configure `min_speech_duration_ms` (50-2000ms) to ignore very short utterances.
- Configure `min_silence_duration_ms` (50-2000ms) to require minimum silence before commit.
- Best for: Hands-free applications, voice assistants, meeting transcription.
- MANUAL (`CommitStrategy.MANUAL`): Use when you need precise control over segment boundaries.
- Call `connection.commit()` to explicitly finalize a segment.
- Commit often to keep latency low (as noted in the API docs).
- Best for: Push-to-talk interfaces, dictation with explicit sentence endings, applications with custom segmentation logic.
- Default: MANUAL is the default when no `commit_strategy` is specified.
- Trade-off: VAD adds convenience but may split sentences mid-thought during natural pauses. MANUAL gives full control but requires application-level logic to determine when to commit.
Reasoning
VAD is the simpler choice for most applications since it handles segmentation automatically. However, VAD can produce awkward segment boundaries during natural speech pauses (e.g., thinking pauses), leading to fragmented transcripts. MANUAL mode avoids this by letting the application define meaningful boundaries (e.g., user releases a button, end of a question).
The tunable VAD parameters allow balancing between responsiveness (short silence threshold) and accuracy (longer threshold to avoid premature commits). The `min_speech_duration_ms` parameter helps filter out background noise or accidental sounds.
Code Evidence
CommitStrategy enum definition in `realtime/scribe.py:32-40`:
class CommitStrategy(str, Enum):
"""
Strategy for committing transcription results.
VAD: Voice Activity Detection - automatically commits when speech ends
MANUAL: Manual commit - requires calling commit() to commit the segment
"""
VAD = "vad"
MANUAL = "manual"
Default to MANUAL in `realtime/scribe.py:192`:
commit_strategy = options.get("commit_strategy", CommitStrategy.MANUAL)
VAD parameter constraints documented in `realtime/scribe.py:52-55`:
vad_silence_threshold_secs: Silence threshold in seconds for VAD (must be between 0.3 and 3.0)
vad_threshold: Threshold for voice activity detection (must be between 0.1 and 0.9)
min_speech_duration_ms: Minimum speech duration in milliseconds (must be between 50 and 2000)
min_silence_duration_ms: Minimum silence duration in milliseconds (must be between 50 and 2000)
Manual commit recommendation in `realtime/connection.py:181`:
"""
Commits the segment, triggering a COMMITTED_TRANSCRIPT event and clearing the buffer.
It's recommend to commit often when using CommitStrategy.MANUAL to keep latency low.
"""