Voice Agents
Real-time phone agents and the streaming pipeline behind them.
The voice pipeline
A live call flows through a fully streaming pipeline so the agent starts replying in well under a second:
- Telephony (Twilio) streams 8 kHz μ-law audio to our voice server over a WebSocket.
- Speech-to-Text (Deepgram, AssemblyAI failover) transcribes the caller in real time.
- LLM (any of the 8 providers) generates the reply, calling your tools when needed.
- Text-to-Speech (ElevenLabs, OpenAI, Deepgram Aura, or Cartesia) speaks the answer, streamed straight back to the caller.
Audio begins playing mid-generation and tool calls run in parallel, which keeps time-to-first-audio low.
Latency budget
A natural-feeling call keeps round-trip latency under ~800 ms. Rough p50 contributions per turn:
| Stage | Typical p50 | Notes |
|---|---|---|
| Endpointing (silence detection) | ~200 ms | Configurable via STT endpointing |
| Speech-to-Text | ~100–150 ms | Deepgram Nova streaming |
| LLM time-to-first-token | 180–900 ms | Depends on model — see the Models page |
| Text-to-Speech first audio | ~75–150 ms | ElevenLabs Flash v2.5 is the fastest |
Reliability
- Per-provider failover — LLM, STT, and TTS each fail over to a configured backup if the primary errors.
- Auto-reconnect — the STT socket reconnects with backoff; a graceful fallback message plays if speech can't recover.
- A max-duration safety timer ends runaway calls.
Languages & voices
Agents support English, German, Spanish, Mandarin, and French. Pick the language, STT model, TTS provider/voice, and optional LLM/STT/TTS fallbacks in the agent's Voice settings.
Call recordings
Turn on Recording in the agent's Overview to capture calls. When a recording is enabled the call is recorded dual-channel; on hangup it's copied into private storage and attached to the call in Logs, where you can play it back or download it (recordings are owner-only via short-lived signed links).
Set a retention period under Settings → GDPR & Privacy to auto-delete recordings after 7–365 days, or keep them indefinitely. You can also delete an individual recording from its call log.