Models
Every LLM, STT, and TTS model — context windows, latency, and pricing.
How to read these tables
Context is the model's context window (tokens). Latency is a rough p50 time-to-first-token used for the builder's latency hint. Input / Output are real provider list prices in USD per 1,000,000 tokens — passed through to you at cost (no markup); the platform's margin is the token price and subscription spread, not a usage fee.
OpenAI
| Model | Context | Latency | In $/1M | Out $/1M |
|---|---|---|---|---|
| GPT-5 | 400K | 750 ms | $1.25 | $10.00 |
| GPT-5 Mini | 400K | 450 ms | $0.25 | $2.00 |
| GPT-5 Nano | 400K | 300 ms | $0.05 | $0.40 |
| GPT-4.1 | 1.05M | 650 ms | $2.00 | $8.00 |
| GPT-4o | 128K | 700 ms | $2.50 | $10.00 |
| GPT-4o Mini | 128K | 390 ms | $0.15 | $0.60 |
Anthropic (Claude)
| Model | Context | Latency | In $/1M | Out $/1M |
|---|---|---|---|---|
| Claude Opus 4 | 200K | 900 ms | $15.00 | $75.00 |
| Claude Sonnet 4 | 200K | 600 ms | $3.00 | $15.00 |
| Claude Haiku 4 | 200K | 400 ms | $0.80 | $4.00 |
Google (Gemini)
| Model | Context | Latency | In $/1M | Out $/1M |
|---|---|---|---|---|
| Gemini 2.5 Pro | 1.05M | 700 ms | $1.25 | $10.00 |
| Gemini 2.5 Flash | 1.05M | 380 ms | $0.30 | $2.50 |
| Gemini 2.0 Flash | 1.05M | 400 ms | $0.10 | $0.40 |
| Gemini 2.0 Flash-Lite | 1.05M | 320 ms | $0.075 | $0.30 |
Groq (fastest)
| Model | Context | Latency | In $/1M | Out $/1M |
|---|---|---|---|---|
| Llama 4 Maverick | 131K | 280 ms | $0.20 | $0.60 |
| Llama 4 Scout | 131K | 230 ms | $0.11 | $0.34 |
| Llama 3.3 70B | 128K | 250 ms | $0.59 | $0.79 |
| Llama 3.1 8B Instant | 128K | 180 ms | $0.05 | $0.08 |
| Qwen 2.5 32B | 128K | 230 ms | $0.79 | $0.79 |
Tip · Groq runs open models on custom hardware — it's the lowest-latency option for real-time voice.
xAI (Grok)
| Model | Context | Latency | In $/1M | Out $/1M |
|---|---|---|---|---|
| Grok 4 | 256K | 800 ms | $3.00 | $15.00 |
| Grok 3 | 131K | 700 ms | $3.00 | $15.00 |
| Grok 3 Mini | 131K | 450 ms | $0.30 | $0.50 |
| Grok 2 Vision | 32K | 750 ms | $2.00 | $10.00 |
Mistral
| Model | Context | Latency | In $/1M | Out $/1M |
|---|---|---|---|---|
| Mistral Large | 128K | 600 ms | $2.00 | $6.00 |
| Mistral Small | 32K | 350 ms | $0.20 | $0.60 |
| Codestral | 256K | 350 ms | $0.30 | $0.90 |
| Ministral 8B | 128K | 250 ms | $0.10 | $0.10 |
| Pixtral Large | 128K | 600 ms | $2.00 | $6.00 |
OpenRouter
OpenRouter is a meta-provider: a single key routes to GPT-5, Claude Sonnet 4, Gemini 2.5 Flash, Llama 4 Maverick, and hundreds more. Pricing matches the underlying model. Use it to reach a model you don't have a direct key for.
Speech-to-Text (STT)
| Provider | Models | Cost |
|---|---|---|
| Deepgram | Nova-3, Nova-2, Nova-2 Phone Call, Enhanced, Base | ≈ $0.0048 / min |
| AssemblyAI | Best, Nano | Usage-based |
Text-to-Speech (TTS)
| Provider | Models | Cost |
|---|---|---|
| ElevenLabs | Flash v2.5, Turbo v2.5, Multilingual v2, Monolingual v1 | ≈ $0.05 / 1k chars (~$0.04/min) |
| OpenAI | GPT-4o mini TTS, TTS-1, TTS-1 HD | Usage-based |
| Deepgram Aura | Aura-2, Aura | Usage-based |
| Cartesia | Sonic-2, Sonic | Usage-based |