Skip to main content
The Engine Tab controls the core AI processing behavior of the agent. This includes the primary LLM settings, fallback model configuration for resilience, and performance settings that affect how quickly and reliably the agent responds during a call.

LLM Settings

Primary Model

The LLM used to generate agent responses for every turn. Select from the available models in the dropdown. Model options vary by your plan. Higher-capability models produce better responses but may have higher latency.
Model TierTrade-off
Fast modelsLower latency, suitable for most conversational tasks
Standard modelsBalanced latency and capability
Advanced modelsHighest capability for complex reasoning, higher latency
Use a fast model for simple task agents (appointment booking, FAQ handling). Reserve advanced models for agents that require multi-step reasoning, tool chaining, or nuanced decision-making.

Temperature

Controls the randomness of LLM outputs. Range: 0.0 to 1.0.
ValueEffect
0.0Fully deterministic. The model always produces the most likely response. Least creative.
0.1 - 0.3Low randomness. Good for task agents that must follow instructions precisely.
0.5 - 0.7Moderate randomness. Good for conversational agents that need natural variation.
0.8 - 1.0High randomness. May produce unexpected or off-script responses. Not recommended for production agents.
High temperature values can cause the agent to deviate from its instructions, generate inconsistent responses, or trigger the wrong functions. Use temperature above 0.7 only after testing extensively with simulation testing.

Fallback Model

Fallback Model

A secondary LLM used when the primary model is unavailable (e.g., provider outage, rate limit) or exceeds the fallback delay threshold. The fallback model processes the response instead of failing the turn. Select a model that is reliable and fast. The fallback is a resilience mechanism, not an upgrade path. Typically choose a model that is simpler and faster than the primary.

Fallback Delay

The number of milliseconds to wait for the primary model before switching to the fallback model. If the primary model does not respond within this window, the fallback model processes the turn.
SettingEffect
Low (500ms)Switches to fallback quickly. Reduces caller-perceived latency on slow primary model responses, but may unnecessarily use the fallback on momentary primary delays.
Medium (1000-2000ms)Balanced. Gives the primary model reasonable time before failing over.
High (3000ms+)Rarely uses fallback. Caller waits longer before a fallback fires.
If no fallback model is configured, a turn that times out on the primary model results in a silent pause on the call. Configuring a fallback model is strongly recommended for production agents.

Response Behavior

Response Eagerness

Controls how quickly the agent begins generating a response after the caller’s last word. A higher eagerness value means the agent starts responding sooner, even if the caller might still be speaking.
ValueBehavior
LowAgent waits longer to confirm the caller has finished speaking. Reduces interruptions. May feel slow.
MediumBalanced. Suitable for most conversational agents.
HighAgent begins responding quickly. Better for callers who speak in short, direct phrases. May occasionally interrupt callers who pause mid-sentence.
Tune response eagerness in conjunction with the silence timeout and barge-in settings. Test the result with audio testing to find the right feel for your use case.

Predictive Preprocessing

Predictive preprocessing pre-generates likely agent responses before the caller finishes speaking, using partial transcription. When enabled:
  1. As the caller speaks, DialNexa transcribes partial utterances in real time.
  2. The LLM pre-generates candidate responses based on the partial transcript.
  3. When the caller finishes speaking and the final transcript is confirmed, the best matching pre-generated response is used immediately, or a fresh response is generated if no candidate matches.
This reduces the gap between caller turn end and agent response start.
SettingWhen to Enable
EnabledHigh call volume agents where latency reduction matters, and callers ask predictable questions
DisabledAgents where caller inputs are highly variable and pre-generation rarely matches, or where the prompt is complex and pre-generation accuracy is low
Predictive preprocessing increases LLM token consumption because candidate responses are generated that may not be used. Factor this into cost estimates for high-volume deployments.

Transcriber Settings

Transcriber Model

The ASR (Automatic Speech Recognition) model used to transcribe caller audio to text. The available models depend on the agent’s configured language.
Model TypeTrade-off
StandardFast transcription, suitable for clear audio
EnhancedHigher accuracy for accented speech, noisy environments, or domain-specific terminology

Transcriber Language

Inherited from the agent language set in the Agent Tab. Displayed here for reference. To change it, update the language in the Agent Tab.

Save Changes

Click Save to save to the current draft. Changes take effect when the draft is published as a new version.