Speech Generation API

Beta Access Required: The Speech API requires whitelisted access.

To request access, email sales@demeterics.com with:

Subject: "Feature Access Request"

Feature name: "Text-to-Speech (TTS)"

For multi-speaker podcast generation, also request: "TTS Multi-Speaker"

The Demeterics Speech API provides a unified Text-to-Speech (TTS) interface across multiple providers. Convert text to natural-sounding audio with a single API while automatically tracking usage, costs, and storing generated audio for analysis.

Overview

Base URL: https://api.demeterics.com/tts/v1

Features:

Unified API: Single endpoint for OpenAI, ElevenLabs, Google Cloud TTS, Murf.ai, Groq Orpheus, and Google Gemini
Multi-Speaker: Generate podcasts and dialogues with up to 2 speakers (Gemini)
Auto-tracking: Every request logged to BigQuery with full observability
Audio Storage: Generated audio stored in GCS with 15-minute signed URLs
BYOK Support: Use your own provider API keys with dual-key authentication
Cost Control: Automatic credit billing with 15% managed or 10% BYOK fee

Authentication

Managed Keys (Default)

Use only your Demeterics API key:

curl -X POST https://api.demeterics.com/tts/v1/generate \
  -H "Authorization: Bearer dmt_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{...}'

Bring Your Own Key (BYOK)

Use the dual-key format to provide your own provider API key:

curl -X POST https://api.demeterics.com/tts/v1/generate \
  -H "Authorization: Bearer dmt_your_api_key;sk-your_openai_key" \
  -H "Content-Type: application/json" \
  -d '{...}'

The format is: [demeterics_api_key];[provider_api_key]

BYOK Benefits:

10% service fee instead of 15%
Use your own rate limits and quotas
Provider costs billed directly to your account

Endpoints

Generate Speech

POST /tts/v1/generate

Convert text to speech audio.

Request Body:

Field	Type	Required	Description
`provider`	string	Yes	Target provider: `openai`, `elevenlabs`, `google`, `murf`, `groq`, `gemini`
`model`	string	No	TTS model (provider-specific)
`voice`	string	No	Voice identifier (single speaker)
`input`	string	Yes	Text to convert (max varies by provider)
`format`	string	No	Output format: `mp3`, `wav`, `opus`, `flac`
`speed`	float	No	Playback speed: 0.25-4.0 (default: 1.0)
`language`	string	No	Language code (ISO 639-1)
`speakers`	array	No	Multi-speaker config (Gemini only, max 2)

Example Request:

curl -X POST https://api.demeterics.com/tts/v1/generate \
  -H "Authorization: Bearer dmt_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "tts-1",
    "voice": "alloy",
    "input": "Hello, welcome to Demeterics!",
    "format": "mp3"
  }'

Response:

{
  "id": "01JARV4HZ6XPQMWVCS9N1GKEFD",
  "provider": "openai",
  "model": "tts-1",
  "voice": "alloy",
  "audio_url": "https://storage.googleapis.com/demeterics-data/tts/...",
  "duration_seconds": 2.3,
  "cost_usd": 0.00023,
  "usage": {
    "input_chars": 31
  },
  "metadata": {
    "format": "mp3",
    "sample_rate": 24000,
    "channels": 1,
    "generation_ms": 450
  }
}

List Voices

GET /tts/v1/voices?provider={provider}

List available voices for a provider.

Query Parameters:

Parameter	Type	Required	Description
`provider`	string	Yes	Provider: `openai`, `elevenlabs`, `google`, `murf`

Example Request:

curl -X GET "https://api.demeterics.com/tts/v1/voices?provider=openai" \
  -H "Authorization: Bearer dmt_your_api_key"

Response:

{
  "voices": [
    {
      "id": "alloy",
      "name": "Alloy",
      "description": "Neutral and balanced",
      "gender": "neutral"
    },
    {
      "id": "echo",
      "name": "Echo",
      "description": "Clear and articulate",
      "gender": "male"
    }
  ]
}

Providers

OpenAI

Models:

gpt-4o-mini-tts - Latest model with better steerability (~85% cheaper than ElevenLabs)
tts-1 - Fast and efficient (legacy)
tts-1-hd - Higher quality (legacy)

Voices:

alloy - Neutral and balanced
ash - Warm and conversational
ballad - Soft and melodic
coral - Friendly and approachable
echo - Clear and articulate
fable - Expressive and dynamic
onyx - Deep and authoritative
nova - Friendly and warm
sage - Calm and measured
shimmer - Bright and optimistic
verse - Dynamic and engaging

Supported Formats: mp3, opus, aac, flac, wav, pcm

Max Characters: 4,096

ElevenLabs

Models:

eleven_multilingual_v2 - Best quality, 29 languages
eleven_turbo_v2_5 - Fast, English-optimized
eleven_turbo_v2 - Previous fast model
eleven_monolingual_v1 - English only

Voices: Over 100 pre-made voices plus custom voice cloning

Supported Formats: mp3, pcm, ulaw

Max Characters: 5,000

Google Cloud TTS

Models:

standard - Basic quality
neural2 - Neural network based
wavenet - High quality WaveNet
journey - Conversational style
studio - Professional quality

Voices: 220+ voices across 40+ languages

Supported Formats: mp3, wav, ogg

Max Characters: 5,000

Murf.ai

Models:

GEN2 - Latest generation, highest quality ($0.03/1000 chars)
FALCON - Fast streaming model ($0.01/1000 chars) ← Recommended for Voice-to-Voice

Voices: 120+ voices across 20+ languages including:

en-US-natalie - Natalie (US English, female) — clear, professional
en-US-samantha - Samantha (US English, female) — warm, conversational
en-US-terrell - Terrell (US English, male) — deep, authoritative
en-US-wayne - Wayne (US English, male) — friendly, casual
en-UK-hazel - Hazel (UK English, female) — British accent
en-UK-ruby - Ruby (UK English, female) — British, professional
en-UK-maisie - Maisie (UK English, female) — British, youthful
en-AU-lincoln - Lincoln (Australian, male) — Australian accent

Supported Formats: mp3, wav, flac, ogg, pcm, alaw, ulaw

Max Characters: 10,000

Features:

Voice styles (conversational, newscast, etc.)
Speed and pitch control
Multi-language support with native locales
Streaming support via /v1/speech/stream endpoint

Murf Falcon Streaming

The FALCON model supports real-time audio streaming, ideal for conversational AI applications. This is used by the AI Chat Widget's Voice-to-Voice feature.

Streaming Endpoint: POST https://api.murf.ai/v1/speech/stream

Request Body:

{
  "text": "Hello, how can I help you today?",
  "voiceId": "en-US-natalie",
  "model": "FALCON",
  "format": "WAV",
  "sampleRate": 24000,
  "channelType": "MONO",
  "multiNativeLocale": "en-US"
}

Response: Raw WAV audio bytes (not JSON) — streamed as they're generated

Performance:

~130ms time-to-first-audio (TTFA)
Optimized for low-latency applications
WAV format at 24kHz mono

AI Chat Widget Integration:

When Voice-to-Voice is enabled, the widget uses a two-phase approach:

Phase 1 — POST /api/widget/voice returns text immediately + stream_token
Phase 2 — GET /api/widget/voice/stream?token=X streams Falcon audio via SSE

This architecture displays the AI's response text immediately while audio streams in the background, providing a responsive user experience.

Cost: $0.01 per 1,000 characters (billed when stream is consumed)

Google Gemini TTS

Beta Access: Gemini TTS with multi-speaker support is available to whitelisted users. Contact support to request access.

Models:

gemini-2.5-flash-preview-tts - Fast, cost-effective (default)
gemini-2.5-pro-preview-tts - Higher quality

Voices (30 prebuilt voices):

Puck - Upbeat
Kore - Firm
Charon - Informative
Zephyr - Bright
Fenrir - Excitable
Leda - Youthful
Aoede - Breezy
Sulafat - Warm
Achird - Friendly
And 21 more...

Supported Formats: wav

Max Characters: 8,000

Features:

Multi-speaker support: Up to 2 speakers with different voices
30 prebuilt voice options
Ideal for podcasts, dialogues, and conversational content

Multi-Speaker Mode (Podcasts & Dialogues)

Generate conversational audio with up to 2 distinct speakers, each with their own voice. Perfect for:

Podcasts with host and guest
Dialogues between characters
Interview-style content
Educational back-and-forth explanations

Request Body (Multi-Speaker):

Field	Type	Required	Description
`provider`	string	Yes	Must be `gemini`
`model`	string	No	`gemini-2.5-flash-preview-tts` (default)
`input`	string	Yes	Dialogue with speaker labels
`speakers`	array	Yes	Speaker-to-voice mapping (max 2)
`format`	string	No	Output format (default: `wav`)

Speaker Configuration:

Each speaker object has:

Field	Type	Required	Description
`name`	string	Yes	Speaker label (must match input text)
`voice`	string	Yes	Voice ID (e.g., `Puck`, `Kore`)

Example: Podcast Generation

curl -X POST https://api.demeterics.com/tts/v1/generate \
  -H "Authorization: Bearer dmt_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "gemini",
    "model": "gemini-2.5-flash-preview-tts",
    "input": "Host: Welcome to the AI Insights podcast! Today we explore the future of voice AI.\nGuest: Thanks for having me! Voice technology is transforming how we interact with machines.",
    "speakers": [
      {"name": "Host", "voice": "Puck"},
      {"name": "Guest", "voice": "Kore"}
    ],
    "format": "wav"
  }'

Response:

{
  "id": "tts_01JARV4HZ6XPQMWVCS9N1GKEFD",
  "provider": "gemini",
  "model": "gemini-2.5-flash-preview-tts",
  "audio_url": "https://storage.googleapis.com/demeterics-data/tts/...",
  "duration_seconds": 8.5,
  "cost_usd": 0.00125,
  "usage": {
    "input_chars": 156
  }
}

Python Example:

import requests

response = requests.post(
    "https://api.demeterics.com/tts/v1/generate",
    headers={"Authorization": "Bearer dmt_your_api_key"},
    json={
        "provider": "gemini",
        "input": """Host: What's the biggest challenge in AI today?
Guest: I'd say it's making AI accessible to everyone, not just tech companies.""",
        "speakers": [
            {"name": "Host", "voice": "Puck"},
            {"name": "Guest", "voice": "Kore"}
        ]
    }
)

audio_url = response.json()["audio_url"]
print(f"Podcast audio: {audio_url}")

Best Practices for Multi-Speaker:

Consistent labels: Use the same speaker names throughout (e.g., Host: not Announcer:)
Clear formatting: Start each line with Speaker: followed by their dialogue
Voice pairing: Choose voices with distinct characteristics (e.g., upbeat + firm)
Keep turns short: Shorter dialogue turns sound more natural
Max 2 speakers: Gemini currently supports up to 2 distinct speakers

Groq Orpheus (Canopy Labs)

Migration Notice: PlayAI TTS models (playai-tts, playai-tts-arabic) are deprecated and will be decommissioned on December 31, 2025. Please migrate to canopylabs/orpheus-v1-english.

Models:

canopylabs/orpheus-v1-english - Expressive English TTS with vocal direction support

Voices (8 voices):

tara - Female, conversational (default)
leah - Female, professional
jess - Female, friendly
leo - Male, conversational
dan - Male, professional
mia - Female, warm
zac - Male, casual
zoe - Female, clear

Supported Formats: wav only

Max Characters: 200 per request

Features:

Vocal Directions: Control speech style with bracketed commands:
- Conversational: [cheerful], [friendly], [casual], [warm]
- Professional: [professionally], [authoritatively], [formally]
- Expressive: [whisper], [excited], [dramatic], [deadpan], [sarcastic]
- Vocal qualities: [gravelly whisper], [rapid babbling], [singsong], [breathy]
Fast generation via Groq infrastructure
More directions = more expressive; fewer/no directions = natural, casual
56% cheaper than PlayAI ($22/1M chars vs $50/1M chars)

Pricing

Managed Keys

Character-based pricing with 15% service fee:

Provider	Model	Cost per 1M chars
OpenAI	gpt-4o-mini-tts	$0.69
OpenAI	tts-1	$17.25
OpenAI	tts-1-hd	$34.50
ElevenLabs	eleven_multilingual_v2	$345.00
ElevenLabs	eleven_turbo_v2_5	$86.25
Google	wavenet	$18.40
Google	neural2	$18.40
Google	standard	$4.60
Murf	GEN2	$27.60
Murf	FALCON	$23.00
Groq	canopylabs/orpheus-v1-english	$22.00
Gemini	gemini-2.5-flash-preview-tts	$11.50
Gemini	gemini-2.5-pro-preview-tts	$57.50

BYOK

10% service fee on top of provider costs. Provider costs billed directly to your account.

Error Handling

Error Response Format:

{
  "error": {
    "type": "invalid_request",
    "message": "Input text exceeds maximum length",
    "code": "text_too_long"
  }
}

Common Error Codes:

Code	HTTP Status	Description
`invalid_provider`	400	Unknown provider specified
`invalid_voice`	400	Voice not available for provider
`text_too_long`	400	Input exceeds provider limit
`insufficient_credits`	402	Not enough credits
`provider_error`	502	Provider API failed
`rate_limited`	429	Too many requests

Data Tracking

Every speech generation is automatically tracked in BigQuery with:

Transaction ID (ULID)
User and API key identifiers
Provider, model, and voice used
Input character count and text hash (privacy-safe)
Audio duration and format
GCS storage path
Cost breakdown (provider cost, service fee, total)
Latency metrics
Error information (if failed)

Query your speech generations:

SELECT
  transaction_id,
  provider,
  model,
  tts.voice,
  tts.input_chars,
  tts.duration_sec,
  total_cost
FROM `demeterics.demeterics.interactions`
WHERE interaction_type = 'tts'
  AND user_id = @user_id
  AND timing.question_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
ORDER BY timing.question_time DESC

SDK Support

Python

import requests

response = requests.post(
    "https://api.demeterics.com/tts/v1/generate",
    headers={"Authorization": "Bearer dmt_your_api_key"},
    json={
        "provider": "openai",
        "voice": "alloy",
        "input": "Hello, world!",
        "format": "mp3"
    }
)

audio_url = response.json()["audio_url"]

Node.js

const response = await fetch("https://api.demeterics.com/tts/v1/generate", {
  method: "POST",
  headers: {
    "Authorization": "Bearer dmt_your_api_key",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    provider: "openai",
    voice: "alloy",
    input: "Hello, world!",
    format: "mp3"
  })
});

const { audio_url } = await response.json();

Best Practices

Choose the right provider: OpenAI for speed, ElevenLabs for quality, Google for language coverage
Cache audio: Store frequently-used audio locally to reduce API calls
Use appropriate formats: MP3 for web, WAV for editing, Opus for streaming
Monitor costs: Track usage in your Demeterics dashboard
Handle errors gracefully: Implement retry logic with exponential backoff