Speech Generation API
Beta Access Required: The Speech API requires whitelisted access.
To request access, email sales@demeterics.com with:
- Subject: "Feature Access Request"
- Feature name: "Text-to-Speech (TTS)"
For multi-speaker podcast generation, also request: "TTS Multi-Speaker"
The Demeterics Speech API provides a unified Text-to-Speech (TTS) interface across multiple providers. Convert text to natural-sounding audio with a single API while automatically tracking usage, costs, and storing generated audio for analysis.
Overview
Base URL: https://api.demeterics.com/tts/v1
Features:
- Unified API: Single endpoint for OpenAI, ElevenLabs, Google Cloud TTS, Murf.ai, Groq Orpheus, and Google Gemini
- Multi-Speaker: Generate podcasts and dialogues with up to 2 speakers (Gemini)
- Auto-tracking: Every request logged to BigQuery with full observability
- Audio Storage: Generated audio stored in GCS with 15-minute signed URLs
- BYOK Support: Use your own provider API keys with dual-key authentication
- Cost Control: Automatic credit billing with 15% managed or 10% BYOK fee
Authentication
Managed Keys (Default)
Use only your Demeterics API key:
curl -X POST https://api.demeterics.com/tts/v1/generate \
-H "Authorization: Bearer dmt_your_api_key" \
-H "Content-Type: application/json" \
-d '{...}'
Bring Your Own Key (BYOK)
Use the dual-key format to provide your own provider API key:
curl -X POST https://api.demeterics.com/tts/v1/generate \
-H "Authorization: Bearer dmt_your_api_key;sk-your_openai_key" \
-H "Content-Type: application/json" \
-d '{...}'
The format is: [demeterics_api_key];[provider_api_key]
BYOK Benefits:
- 10% service fee instead of 15%
- Use your own rate limits and quotas
- Provider costs billed directly to your account
Endpoints
Generate Speech
POST /tts/v1/generate
Convert text to speech audio.
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
provider |
string | Yes | Target provider: openai, elevenlabs, google, murf, groq, gemini |
model |
string | No | TTS model (provider-specific) |
voice |
string | No | Voice identifier (single speaker) |
input |
string | Yes | Text to convert (max varies by provider) |
format |
string | No | Output format: mp3, wav, opus, flac |
speed |
float | No | Playback speed: 0.25-4.0 (default: 1.0) |
language |
string | No | Language code (ISO 639-1) |
speakers |
array | No | Multi-speaker config (Gemini only, max 2) |
Example Request:
curl -X POST https://api.demeterics.com/tts/v1/generate \
-H "Authorization: Bearer dmt_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "tts-1",
"voice": "alloy",
"input": "Hello, welcome to Demeterics!",
"format": "mp3"
}'
Response:
{
"id": "01JARV4HZ6XPQMWVCS9N1GKEFD",
"provider": "openai",
"model": "tts-1",
"voice": "alloy",
"audio_url": "https://storage.googleapis.com/demeterics-data/tts/...",
"duration_seconds": 2.3,
"cost_usd": 0.00023,
"usage": {
"input_chars": 31
},
"metadata": {
"format": "mp3",
"sample_rate": 24000,
"channels": 1,
"generation_ms": 450
}
}
List Voices
GET /tts/v1/voices?provider={provider}
List available voices for a provider.
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
provider |
string | Yes | Provider: openai, elevenlabs, google, murf |
Example Request:
curl -X GET "https://api.demeterics.com/tts/v1/voices?provider=openai" \
-H "Authorization: Bearer dmt_your_api_key"
Response:
{
"voices": [
{
"id": "alloy",
"name": "Alloy",
"description": "Neutral and balanced",
"gender": "neutral"
},
{
"id": "echo",
"name": "Echo",
"description": "Clear and articulate",
"gender": "male"
}
]
}
Providers
OpenAI
Models:
gpt-4o-mini-tts- Latest model with better steerability (~85% cheaper than ElevenLabs)tts-1- Fast and efficient (legacy)tts-1-hd- Higher quality (legacy)
Voices:
alloy- Neutral and balancedash- Warm and conversationalballad- Soft and melodiccoral- Friendly and approachableecho- Clear and articulatefable- Expressive and dynamiconyx- Deep and authoritativenova- Friendly and warmsage- Calm and measuredshimmer- Bright and optimisticverse- Dynamic and engaging
Supported Formats: mp3, opus, aac, flac, wav, pcm
Max Characters: 4,096
ElevenLabs
Models:
eleven_multilingual_v2- Best quality, 29 languageseleven_turbo_v2_5- Fast, English-optimizedeleven_turbo_v2- Previous fast modeleleven_monolingual_v1- English only
Voices: Over 100 pre-made voices plus custom voice cloning
Supported Formats: mp3, pcm, ulaw
Max Characters: 5,000
Google Cloud TTS
Models:
standard- Basic qualityneural2- Neural network basedwavenet- High quality WaveNetjourney- Conversational stylestudio- Professional quality
Voices: 220+ voices across 40+ languages
Supported Formats: mp3, wav, ogg
Max Characters: 5,000
Murf.ai
Models:
GEN2- Latest generation, highest quality ($0.03/1000 chars)FALCON- Fast streaming model ($0.01/1000 chars) ← Recommended for Voice-to-Voice
Voices: 120+ voices across 20+ languages including:
en-US-natalie- Natalie (US English, female) — clear, professionalen-US-samantha- Samantha (US English, female) — warm, conversationalen-US-terrell- Terrell (US English, male) — deep, authoritativeen-US-wayne- Wayne (US English, male) — friendly, casualen-UK-hazel- Hazel (UK English, female) — British accenten-UK-ruby- Ruby (UK English, female) — British, professionalen-UK-maisie- Maisie (UK English, female) — British, youthfulen-AU-lincoln- Lincoln (Australian, male) — Australian accent
Supported Formats: mp3, wav, flac, ogg, pcm, alaw, ulaw
Max Characters: 10,000
Features:
- Voice styles (conversational, newscast, etc.)
- Speed and pitch control
- Multi-language support with native locales
- Streaming support via
/v1/speech/streamendpoint
Murf Falcon Streaming
The FALCON model supports real-time audio streaming, ideal for conversational AI applications. This is used by the AI Chat Widget's Voice-to-Voice feature.
Streaming Endpoint: POST https://api.murf.ai/v1/speech/stream
Request Body:
{
"text": "Hello, how can I help you today?",
"voiceId": "en-US-natalie",
"model": "FALCON",
"format": "WAV",
"sampleRate": 24000,
"channelType": "MONO",
"multiNativeLocale": "en-US"
}
Response: Raw WAV audio bytes (not JSON) — streamed as they're generated
Performance:
- ~130ms time-to-first-audio (TTFA)
- Optimized for low-latency applications
- WAV format at 24kHz mono
AI Chat Widget Integration:
When Voice-to-Voice is enabled, the widget uses a two-phase approach:
- Phase 1 —
POST /api/widget/voicereturns text immediately +stream_token - Phase 2 —
GET /api/widget/voice/stream?token=Xstreams Falcon audio via SSE
This architecture displays the AI's response text immediately while audio streams in the background, providing a responsive user experience.
Cost: $0.01 per 1,000 characters (billed when stream is consumed)
Google Gemini TTS
Beta Access: Gemini TTS with multi-speaker support is available to whitelisted users. Contact support to request access.
Models:
gemini-2.5-flash-preview-tts- Fast, cost-effective (default)gemini-2.5-pro-preview-tts- Higher quality
Voices (30 prebuilt voices):
Puck- UpbeatKore- FirmCharon- InformativeZephyr- BrightFenrir- ExcitableLeda- YouthfulAoede- BreezySulafat- WarmAchird- Friendly- And 21 more...
Supported Formats: wav
Max Characters: 8,000
Features:
- Multi-speaker support: Up to 2 speakers with different voices
- 30 prebuilt voice options
- Ideal for podcasts, dialogues, and conversational content
Multi-Speaker Mode (Podcasts & Dialogues)
Generate conversational audio with up to 2 distinct speakers, each with their own voice. Perfect for:
- Podcasts with host and guest
- Dialogues between characters
- Interview-style content
- Educational back-and-forth explanations
Request Body (Multi-Speaker):
| Field | Type | Required | Description |
|---|---|---|---|
provider |
string | Yes | Must be gemini |
model |
string | No | gemini-2.5-flash-preview-tts (default) |
input |
string | Yes | Dialogue with speaker labels |
speakers |
array | Yes | Speaker-to-voice mapping (max 2) |
format |
string | No | Output format (default: wav) |
Speaker Configuration:
Each speaker object has:
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Speaker label (must match input text) |
voice |
string | Yes | Voice ID (e.g., Puck, Kore) |
Example: Podcast Generation
curl -X POST https://api.demeterics.com/tts/v1/generate \
-H "Authorization: Bearer dmt_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"provider": "gemini",
"model": "gemini-2.5-flash-preview-tts",
"input": "Host: Welcome to the AI Insights podcast! Today we explore the future of voice AI.\nGuest: Thanks for having me! Voice technology is transforming how we interact with machines.",
"speakers": [
{"name": "Host", "voice": "Puck"},
{"name": "Guest", "voice": "Kore"}
],
"format": "wav"
}'
Response:
{
"id": "tts_01JARV4HZ6XPQMWVCS9N1GKEFD",
"provider": "gemini",
"model": "gemini-2.5-flash-preview-tts",
"audio_url": "https://storage.googleapis.com/demeterics-data/tts/...",
"duration_seconds": 8.5,
"cost_usd": 0.00125,
"usage": {
"input_chars": 156
}
}
Python Example:
import requests
response = requests.post(
"https://api.demeterics.com/tts/v1/generate",
headers={"Authorization": "Bearer dmt_your_api_key"},
json={
"provider": "gemini",
"input": """Host: What's the biggest challenge in AI today?
Guest: I'd say it's making AI accessible to everyone, not just tech companies.""",
"speakers": [
{"name": "Host", "voice": "Puck"},
{"name": "Guest", "voice": "Kore"}
]
}
)
audio_url = response.json()["audio_url"]
print(f"Podcast audio: {audio_url}")
Best Practices for Multi-Speaker:
- Consistent labels: Use the same speaker names throughout (e.g.,
Host:notAnnouncer:) - Clear formatting: Start each line with
Speaker:followed by their dialogue - Voice pairing: Choose voices with distinct characteristics (e.g., upbeat + firm)
- Keep turns short: Shorter dialogue turns sound more natural
- Max 2 speakers: Gemini currently supports up to 2 distinct speakers
Groq Orpheus (Canopy Labs)
Migration Notice: PlayAI TTS models (
playai-tts,playai-tts-arabic) are deprecated and will be decommissioned on December 31, 2025. Please migrate tocanopylabs/orpheus-v1-english.
Models:
canopylabs/orpheus-v1-english- Expressive English TTS with vocal direction support
Voices (8 voices):
tara- Female, conversational (default)leah- Female, professionaljess- Female, friendlyleo- Male, conversationaldan- Male, professionalmia- Female, warmzac- Male, casualzoe- Female, clear
Supported Formats: wav only
Max Characters: 200 per request
Features:
- Vocal Directions: Control speech style with bracketed commands:
- Conversational:
[cheerful],[friendly],[casual],[warm] - Professional:
[professionally],[authoritatively],[formally] - Expressive:
[whisper],[excited],[dramatic],[deadpan],[sarcastic] - Vocal qualities:
[gravelly whisper],[rapid babbling],[singsong],[breathy]
- Conversational:
- Fast generation via Groq infrastructure
- More directions = more expressive; fewer/no directions = natural, casual
- 56% cheaper than PlayAI ($22/1M chars vs $50/1M chars)
Pricing
Managed Keys
Character-based pricing with 15% service fee:
| Provider | Model | Cost per 1M chars |
|---|---|---|
| OpenAI | gpt-4o-mini-tts | $0.69 |
| OpenAI | tts-1 | $17.25 |
| OpenAI | tts-1-hd | $34.50 |
| ElevenLabs | eleven_multilingual_v2 | $345.00 |
| ElevenLabs | eleven_turbo_v2_5 | $86.25 |
| wavenet | $18.40 | |
| neural2 | $18.40 | |
| standard | $4.60 | |
| Murf | GEN2 | $27.60 |
| Murf | FALCON | $23.00 |
| Groq | canopylabs/orpheus-v1-english | $22.00 |
| Gemini | gemini-2.5-flash-preview-tts | $11.50 |
| Gemini | gemini-2.5-pro-preview-tts | $57.50 |
BYOK
10% service fee on top of provider costs. Provider costs billed directly to your account.
Error Handling
Error Response Format:
{
"error": {
"type": "invalid_request",
"message": "Input text exceeds maximum length",
"code": "text_too_long"
}
}
Common Error Codes:
| Code | HTTP Status | Description |
|---|---|---|
invalid_provider |
400 | Unknown provider specified |
invalid_voice |
400 | Voice not available for provider |
text_too_long |
400 | Input exceeds provider limit |
insufficient_credits |
402 | Not enough credits |
provider_error |
502 | Provider API failed |
rate_limited |
429 | Too many requests |
Data Tracking
Every speech generation is automatically tracked in BigQuery with:
- Transaction ID (ULID)
- User and API key identifiers
- Provider, model, and voice used
- Input character count and text hash (privacy-safe)
- Audio duration and format
- GCS storage path
- Cost breakdown (provider cost, service fee, total)
- Latency metrics
- Error information (if failed)
Query your speech generations:
SELECT
transaction_id,
provider,
model,
tts.voice,
tts.input_chars,
tts.duration_sec,
total_cost
FROM `demeterics.demeterics.interactions`
WHERE interaction_type = 'tts'
AND user_id = @user_id
AND timing.question_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
ORDER BY timing.question_time DESC
SDK Support
Python
import requests
response = requests.post(
"https://api.demeterics.com/tts/v1/generate",
headers={"Authorization": "Bearer dmt_your_api_key"},
json={
"provider": "openai",
"voice": "alloy",
"input": "Hello, world!",
"format": "mp3"
}
)
audio_url = response.json()["audio_url"]
Node.js
const response = await fetch("https://api.demeterics.com/tts/v1/generate", {
method: "POST",
headers: {
"Authorization": "Bearer dmt_your_api_key",
"Content-Type": "application/json"
},
body: JSON.stringify({
provider: "openai",
voice: "alloy",
input: "Hello, world!",
format: "mp3"
})
});
const { audio_url } = await response.json();
Best Practices
- Choose the right provider: OpenAI for speed, ElevenLabs for quality, Google for language coverage
- Cache audio: Store frequently-used audio locally to reduce API calls
- Use appropriate formats: MP3 for web, WAV for editing, Opus for streaming
- Monitor costs: Track usage in your Demeterics dashboard
- Handle errors gracefully: Implement retry logic with exponential backoff