The Audio Tab controls all settings related to the agent’s voice output and audio processing. This includes which voice model speaks, how fast and consistently it speaks, and whether audio caching and noise reduction are applied.
Voice Settings
Voice Model
The text-to-speech (TTS) voice model used to generate the agent’s speech. Available voices depend on the language selected in the Agent Tab. Each voice has a distinct accent, tone, and speaking style.
To preview a voice before selecting it, click the play button next to the voice name. The preview plays a short sample using the current speed and stability settings.
Match the voice to your use case. A warm, conversational voice suits a support agent. A clear, neutral voice suits an automated scheduling agent. Test with real callers before committing to a voice for production.
Speed
Controls the speech rate. Range: 0.5 (slow) to 2.0 (fast). Default is 1.0.
| Value | Effect |
|---|
0.7 - 0.9 | Slower speech. Good for elderly callers or complex instructions. |
1.0 | Normal speed. |
1.1 - 1.3 | Slightly faster. Suitable for callers who expect efficiency. |
1.4+ | Fast. May be hard to follow for some callers. Test carefully. |
Speed affects perceived professionalism and comprehension. Run audio testing with real human listeners at your target speed before deploying.
Stability
Controls the consistency of the voice’s expression and prosody. Range: 0.0 (most expressive, most variable) to 1.0 (most stable, least expressive).
| Value | Effect |
|---|
0.0 - 0.3 | Highly expressive. Voice varies significantly across sentences. Sounds more natural but less predictable. |
0.4 - 0.6 | Moderate. Good balance for conversational agents. |
0.7 - 1.0 | Highly stable. Consistent delivery with less variation. Sounds more robotic at high values. |
For task-focused agents (appointment booking, order lookup), higher stability (0.6-0.8) produces more professional, consistent output. For empathetic or conversational agents, lower stability (0.3-0.5) sounds more human.
Volume
Controls the output volume relative to baseline. Range: -6 dB to +6 dB. Default is 0 dB.
Adjust if your callers consistently report the agent being too quiet or too loud. Use audio testing to verify perceived volume before changing this setting in production.
Audio Cache
Enable Audio Cache
When enabled, DialNexa caches pre-rendered TTS audio for static phrases in the agent’s responses. On subsequent calls, cached audio plays instantly instead of being rendered in real time.
What gets cached:
- Static portions of the welcome message
- Fixed phrases in prompt templates (e.g., “One moment while I look that up.”)
- Predetermined responses that do not vary by caller
What is not cached:
- Responses containing dynamic variable content (e.g.,
{{customer_name}})
- LLM-generated responses that vary per turn
Enabling the audio cache reduces the TTS rendering step for repeated static phrases, lowering the time-to-first-audio on turns that use cached content.
Audio cache is populated on the first call that uses each phrase. The first caller does not see a latency benefit for cached items; subsequent callers do.
Clear Cache
Click Clear Cache to invalidate all cached audio for this agent. Do this after changing the voice model, speed, or stability settings. Old cached audio rendered with prior settings will not match the new configuration.
If you change voice settings and do not clear the cache, callers may hear a mix of old and new voice styles for different phrases until the cache expires naturally.
Denoising
Denoising Mode
Controls noise reduction applied to inbound caller audio before it is transcribed. This affects transcription accuracy for callers in noisy environments.
| Mode | Description |
|---|
| Off | No processing applied to inbound audio. |
| Mild | Light noise reduction. Removes low-level background noise without affecting voice quality. |
| Aggressive | Strong noise reduction. Reduces background noise significantly. May slightly affect voice fidelity at very high noise levels. |
Enable denoising when your callers are likely to call from noisy environments (mobile, outdoors, contact center floor). Disable it when callers typically call from quiet environments to avoid unnecessary processing.
Save Changes
Click Save to save to the current draft. Remember to clear the audio cache after changing voice, speed, or stability settings to ensure callers hear consistent audio.