Voice providers
DialNexa integrates with four TTS (text-to-speech) providers: ElevenLabs, Cartesia, SmallestAI, and Sarvam AI. Each has a different catalog size, latency profile, and configuration surface.ElevenLabs
Broad catalog with hundreds of voices. Natural prosody and expressive range. Best for agents where realism and variety matter more than raw latency. Supports voice cloning and voice import.Default model:
Settings: Voice Model, Speed, Stability, Volume
eleven_flash_v2_5Settings: Voice Model, Speed, Stability, Volume
Cartesia
Smaller catalog optimized for low-latency delivery. Suitable for high-volume deployments where TTS delay is a primary cost.Model:
Settings: Speed
sonic-2Settings: Speed
SmallestAI
Optimised for Indian languages. Includes Indian voice personas (Diya, Raman, Ananya, Aarav, and more). Best for agents targeting Indian callers in Hindi, Hinglish, or Indian English.Models:
Settings: Voice Model, Voice
lightning, lightning-large, lightning-v2Settings: Voice Model, Voice
Sarvam AI
India-focused TTS provider. Strong support for Indian English and regional Indian language contexts.Model:
Default language:
bulbul:v2Default language:
en-INChoosing a provider
| Consideration | ElevenLabs | Cartesia | SmallestAI | Sarvam AI |
|---|---|---|---|---|
| Voice variety | Large catalog | Smaller catalog | Indian voices | Indian English |
| Latency | Moderate | Lower | Low | Moderate |
| Voice cloning | Supported | Not supported | Not supported | Not supported |
| Best for | Brand voice, expressiveness | High-volume, low-latency | Indian language callers | Indian English callers |
| Language focus | Multilingual | Multilingual | Indian languages | Indian English (en-IN) |
The voice selector
The voice selector in your agent’s Speech Settings lets you browse, filter, and preview available voices. Filters available:- Provider — ElevenLabs or Cartesia
- Gender — Male, Female, Neutral
- Language — narrows to voices that perform well in the selected language
- Use case — Conversational, Narration, Customer Support, and other catalog tags (ElevenLabs)
vel_.... This ID is what the API and webhooks use to reference a voice. Copy a voice’s Nexa ID from the voice card in the selector. Use Nexa voice IDs when configuring agents programmatically so that display name changes on the provider side do not break your configuration.
Voice and language interaction
The voice selector automatically scopes to voices compatible with your agent’s primary language. Configuring an agent with Hindi as the primary language surfaces Hindi-compatible voices. Some voices support multiple languages. If your agent uses auto language switching, confirm that the selected voice supports all candidate languages, not just the primary one. See Supported Languages for per-voice-provider language coverage.Voice settings
ElevenLabs settings
Voice Model ElevenLabs offers multiple synthesis models that trade quality against latency. DialNexa defaults to Flash v2.5, which is optimized for real-time conversation. Unless you have a specific reason to use an alternate model, keep this at the default.| Model | ID | Latency | Best for |
|---|---|---|---|
| Flash v2.5 (default) | eleven_flash_v2_5 | Lowest | Real-time conversation |
| Flash v2 | eleven_flash_v2 | Low | Real-time conversation, older model |
| Turbo v2.5 | eleven_turbo_v2_5 | Low-moderate | Balance of speed and quality |
| Turbo v2 | eleven_turbo_v2 | Moderate | Quality, slightly older turbo tier |
| Multilingual v2 | eleven_multilingual_v2 | Higher | Highest quality, multi-language support |
| Multilingual STS v2 | eleven_multilingual_sts_v2 | Higher | Speech-to-speech, multilingual |
| English STS v2 | eleven_english_sts_v2 | Moderate | Speech-to-speech, English |
- High stability (0.8 and above): recommended for transactional agents where consistent tone matters more than expressiveness
- Low stability (0.3 to 0.5): recommended for conversational agents where natural variation sounds more human
Volume adjustment applies to TTS output only. It does not affect the caller’s microphone gain or the transcription pipeline.