Voice Agents

Real-time phone agents and the streaming pipeline behind them.

The voice pipeline

A live call flows through a fully streaming pipeline so the agent starts replying in well under a second:

  1. Telephony (Twilio) streams 8 kHz μ-law audio to our voice server over a WebSocket.
  2. Speech-to-Text (Deepgram, AssemblyAI failover) transcribes the caller in real time.
  3. LLM (any of the 8 providers) generates the reply, calling your tools when needed.
  4. Text-to-Speech (ElevenLabs, OpenAI, Deepgram Aura, or Cartesia) speaks the answer, streamed straight back to the caller.

Audio begins playing mid-generation and tool calls run in parallel, which keeps time-to-first-audio low.

Latency budget

A natural-feeling call keeps round-trip latency under ~800 ms. Rough p50 contributions per turn:

StageTypical p50Notes
Endpointing (silence detection)~200 msConfigurable via STT endpointing
Speech-to-Text~100–150 msDeepgram Nova streaming
LLM time-to-first-token180–900 msDepends on model — see the Models page
Text-to-Speech first audio~75–150 msElevenLabs Flash v2.5 is the fastest
Tip · For the snappiest calls, pair a fast LLM (Llama 3.1 8B ~180 ms, Gemini 2.5 Flash ~380 ms, GPT-5 Nano ~300 ms) with ElevenLabs Flash v2.5 and Deepgram Nova.

Reliability

  • Per-provider failover — LLM, STT, and TTS each fail over to a configured backup if the primary errors.
  • Auto-reconnect — the STT socket reconnects with backoff; a graceful fallback message plays if speech can't recover.
  • A max-duration safety timer ends runaway calls.

Languages & voices

Agents support English, German, Spanish, Mandarin, and French. Pick the language, STT model, TTS provider/voice, and optional LLM/STT/TTS fallbacks in the agent's Voice settings.

Call recordings

Turn on Recording in the agent's Overview to capture calls. When a recording is enabled the call is recorded dual-channel; on hangup it's copied into private storage and attached to the call in Logs, where you can play it back or download it (recordings are owner-only via short-lived signed links).

Set a retention period under Settings → GDPR & Privacy to auto-delete recordings after 7–365 days, or keep them indefinitely. You can also delete an individual recording from its call log.

Important · Recording is consent-gated by the agent's Recording toggle — it's off unless you enable it. Check local laws on call-recording consent.