Models

Every LLM, STT, and TTS model — context windows, latency, and pricing.

How to read these tables

Context is the model's context window (tokens). Latency is a rough p50 time-to-first-token used for the builder's latency hint. Input / Output are real provider list prices in USD per 1,000,000 tokens — passed through to you at cost (no markup); the platform's margin is the token price and subscription spread, not a usage fee.

OpenAI

Model	Context	Latency	In $/1M	Out $/1M
GPT-5	400K	750 ms	$1.25	$10.00
GPT-5 Mini	400K	450 ms	$0.25	$2.00
GPT-5 Nano	400K	300 ms	$0.05	$0.40
GPT-4.1	1.05M	650 ms	$2.00	$8.00
GPT-4o	128K	700 ms	$2.50	$10.00
GPT-4o Mini	128K	390 ms	$0.15	$0.60

Anthropic (Claude)

Model	Context	Latency	In $/1M	Out $/1M
Claude Opus 4	200K	900 ms	$15.00	$75.00
Claude Sonnet 4	200K	600 ms	$3.00	$15.00
Claude Haiku 4	200K	400 ms	$0.80	$4.00

Google (Gemini)

Model	Context	Latency	In $/1M	Out $/1M
Gemini 2.5 Pro	1.05M	700 ms	$1.25	$10.00
Gemini 2.5 Flash	1.05M	380 ms	$0.30	$2.50
Gemini 2.0 Flash	1.05M	400 ms	$0.10	$0.40
Gemini 2.0 Flash-Lite	1.05M	320 ms	$0.075	$0.30

Groq (fastest)

Model	Context	Latency	In $/1M	Out $/1M
Llama 4 Maverick	131K	280 ms	$0.20	$0.60
Llama 4 Scout	131K	230 ms	$0.11	$0.34
Llama 3.3 70B	128K	250 ms	$0.59	$0.79
Llama 3.1 8B Instant	128K	180 ms	$0.05	$0.08
Qwen 2.5 32B	128K	230 ms	$0.79	$0.79

Tip · Groq runs open models on custom hardware — it's the lowest-latency option for real-time voice.

xAI (Grok)

Model	Context	Latency	In $/1M	Out $/1M
Grok 4	256K	800 ms	$3.00	$15.00
Grok 3	131K	700 ms	$3.00	$15.00
Grok 3 Mini	131K	450 ms	$0.30	$0.50
Grok 2 Vision	32K	750 ms	$2.00	$10.00

Mistral

Model	Context	Latency	In $/1M	Out $/1M
Mistral Large	128K	600 ms	$2.00	$6.00
Mistral Small	32K	350 ms	$0.20	$0.60
Codestral	256K	350 ms	$0.30	$0.90
Ministral 8B	128K	250 ms	$0.10	$0.10
Pixtral Large	128K	600 ms	$2.00	$6.00

OpenRouter

OpenRouter is a meta-provider: a single key routes to GPT-5, Claude Sonnet 4, Gemini 2.5 Flash, Llama 4 Maverick, and hundreds more. Pricing matches the underlying model. Use it to reach a model you don't have a direct key for.

Speech-to-Text (STT)

Provider	Models	Cost
Deepgram	Nova-3, Nova-2, Nova-2 Phone Call, Enhanced, Base	≈ $0.0048 / min
AssemblyAI	Best, Nano	Usage-based

Text-to-Speech (TTS)

Provider	Models	Cost
ElevenLabs	Flash v2.5, Turbo v2.5, Multilingual v2, Monolingual v1	≈ $0.05 / 1k chars (~$0.04/min)
OpenAI	GPT-4o mini TTS, TTS-1, TTS-1 HD	Usage-based
Deepgram Aura	Aura-2, Aura	Usage-based
Cartesia	Sonic-2, Sonic	Usage-based