Skip to main content
The Audio Tab controls all settings related to the agent’s voice output and audio processing. This includes which voice model speaks, how fast and consistently it speaks, and whether audio caching and noise reduction are applied.

Voice Settings

Voice Model

The text-to-speech (TTS) voice model used to generate the agent’s speech. Available voices depend on the language selected in the Agent Tab. Each voice has a distinct accent, tone, and speaking style. To preview a voice before selecting it, click the play button next to the voice name. The preview plays a short sample using the current speed and stability settings.
Match the voice to your use case. A warm, conversational voice suits a support agent. A clear, neutral voice suits an automated scheduling agent. Test with real callers before committing to a voice for production.

Speed

Controls the speech rate. Range: 0.5 (slow) to 2.0 (fast). Default is 1.0.
ValueEffect
0.7 - 0.9Slower speech. Good for elderly callers or complex instructions.
1.0Normal speed.
1.1 - 1.3Slightly faster. Suitable for callers who expect efficiency.
1.4+Fast. May be hard to follow for some callers. Test carefully.
Speed affects perceived professionalism and comprehension. Run audio testing with real human listeners at your target speed before deploying.

Stability

Controls the consistency of the voice’s expression and prosody. Range: 0.0 (most expressive, most variable) to 1.0 (most stable, least expressive).
ValueEffect
0.0 - 0.3Highly expressive. Voice varies significantly across sentences. Sounds more natural but less predictable.
0.4 - 0.6Moderate. Good balance for conversational agents.
0.7 - 1.0Highly stable. Consistent delivery with less variation. Sounds more robotic at high values.
For task-focused agents (appointment booking, order lookup), higher stability (0.6-0.8) produces more professional, consistent output. For empathetic or conversational agents, lower stability (0.3-0.5) sounds more human.

Volume

Controls the output volume relative to baseline. Range: -6 dB to +6 dB. Default is 0 dB. Adjust if your callers consistently report the agent being too quiet or too loud. Use audio testing to verify perceived volume before changing this setting in production.

Audio Cache

Enable Audio Cache

When enabled, DialNexa caches pre-rendered TTS audio for static phrases in the agent’s responses. On subsequent calls, cached audio plays instantly instead of being rendered in real time. What gets cached:
  • Static portions of the welcome message
  • Fixed phrases in prompt templates (e.g., “One moment while I look that up.”)
  • Predetermined responses that do not vary by caller
What is not cached:
  • Responses containing dynamic variable content (e.g., {{customer_name}})
  • LLM-generated responses that vary per turn
Enabling the audio cache reduces the TTS rendering step for repeated static phrases, lowering the time-to-first-audio on turns that use cached content.
Audio cache is populated on the first call that uses each phrase. The first caller does not see a latency benefit for cached items; subsequent callers do.

Clear Cache

Click Clear Cache to invalidate all cached audio for this agent. Do this after changing the voice model, speed, or stability settings. Old cached audio rendered with prior settings will not match the new configuration.
If you change voice settings and do not clear the cache, callers may hear a mix of old and new voice styles for different phrases until the cache expires naturally.

Denoising

Denoising Mode

Controls noise reduction applied to inbound caller audio before it is transcribed. This affects transcription accuracy for callers in noisy environments.
ModeDescription
OffNo processing applied to inbound audio.
MildLight noise reduction. Removes low-level background noise without affecting voice quality.
AggressiveStrong noise reduction. Reduces background noise significantly. May slightly affect voice fidelity at very high noise levels.
Enable denoising when your callers are likely to call from noisy environments (mobile, outdoors, contact center floor). Disable it when callers typically call from quiet environments to avoid unnecessary processing.

Save Changes

Click Save to save to the current draft. Remember to clear the audio cache after changing voice, speed, or stability settings to ensure callers hear consistent audio.