Engine Tab - DialNexa Documentation

The Engine Tab controls the core AI processing behavior of the agent. This includes the primary LLM settings, fallback model configuration for resilience, and performance settings that affect how quickly and reliably the agent responds during a call.

LLM Settings

Primary Model

The LLM used to generate agent responses for every turn. Select from the available models in the dropdown. Model options vary by your plan. Higher-capability models produce better responses but may have higher latency.

Model Tier	Trade-off
Fast models	Lower latency, suitable for most conversational tasks
Standard models	Balanced latency and capability
Advanced models	Highest capability for complex reasoning, higher latency

Use a fast model for simple task agents (appointment booking, FAQ handling). Reserve advanced models for agents that require multi-step reasoning, tool chaining, or nuanced decision-making.

Temperature

Controls the randomness of LLM outputs. Range: 0.0 to 1.0.

Value	Effect
`0.0`	Fully deterministic. The model always produces the most likely response. Least creative.
`0.1 - 0.3`	Low randomness. Good for task agents that must follow instructions precisely.
`0.5 - 0.7`	Moderate randomness. Good for conversational agents that need natural variation.
`0.8 - 1.0`	High randomness. May produce unexpected or off-script responses. Not recommended for production agents.

High temperature values can cause the agent to deviate from its instructions, generate inconsistent responses, or trigger the wrong functions. Use temperature above 0.7 only after testing extensively with simulation testing.

Fallback Model

A secondary LLM used when the primary model is unavailable (e.g., provider outage, rate limit) or exceeds the fallback delay threshold. The fallback model processes the response instead of failing the turn. Select a model that is reliable and fast. The fallback is a resilience mechanism, not an upgrade path. Typically choose a model that is simpler and faster than the primary.

Fallback Delay

The number of milliseconds to wait for the primary model before switching to the fallback model. If the primary model does not respond within this window, the fallback model processes the turn.

Setting	Effect
Low (500ms)	Switches to fallback quickly. Reduces caller-perceived latency on slow primary model responses, but may unnecessarily use the fallback on momentary primary delays.
Medium (1000-2000ms)	Balanced. Gives the primary model reasonable time before failing over.
High (3000ms+)	Rarely uses fallback. Caller waits longer before a fallback fires.

If no fallback model is configured, a turn that times out on the primary model results in a silent pause on the call. Configuring a fallback model is strongly recommended for production agents.

Response Behavior

Response Eagerness

Controls how quickly the agent begins generating a response after the caller’s last word. A higher eagerness value means the agent starts responding sooner, even if the caller might still be speaking.

Value	Behavior
Low	Agent waits longer to confirm the caller has finished speaking. Reduces interruptions. May feel slow.
Medium	Balanced. Suitable for most conversational agents.
High	Agent begins responding quickly. Better for callers who speak in short, direct phrases. May occasionally interrupt callers who pause mid-sentence.

Tune response eagerness in conjunction with the silence timeout and barge-in settings. Test the result with audio testing to find the right feel for your use case.

Predictive Preprocessing

Predictive preprocessing pre-generates likely agent responses before the caller finishes speaking, using partial transcription. When enabled:

As the caller speaks, DialNexa transcribes partial utterances in real time.
The LLM pre-generates candidate responses based on the partial transcript.
When the caller finishes speaking and the final transcript is confirmed, the best matching pre-generated response is used immediately, or a fresh response is generated if no candidate matches.

This reduces the gap between caller turn end and agent response start.

Setting	When to Enable
Enabled	High call volume agents where latency reduction matters, and callers ask predictable questions
Disabled	Agents where caller inputs are highly variable and pre-generation rarely matches, or where the prompt is complex and pre-generation accuracy is low

Predictive preprocessing increases LLM token consumption because candidate responses are generated that may not be used. Factor this into cost estimates for high-volume deployments.

Transcriber Settings

Transcriber Model

The ASR (Automatic Speech Recognition) model used to transcribe caller audio to text. The available models depend on the agent’s configured language.

Model Type	Trade-off
Standard	Fast transcription, suitable for clear audio
Enhanced	Higher accuracy for accented speech, noisy environments, or domain-specific terminology

Transcriber Language

Inherited from the agent language set in the Agent Tab. Displayed here for reference. To change it, update the language in the Agent Tab.

Save Changes

Click Save to save to the current draft. Changes take effect when the draft is published as a new version.

​LLM Settings

​Primary Model

​Temperature

​Fallback Model

​Fallback Model

​Fallback Delay

​Response Behavior

​Response Eagerness

​Predictive Preprocessing

​Transcriber Settings

​Transcriber Model

​Transcriber Language

​Save Changes

​Related

LLM Settings

Primary Model

Temperature

Fallback Model

Fallback Model

Fallback Delay

Response Behavior

Response Eagerness

Predictive Preprocessing

Transcriber Settings

Transcriber Model

Transcriber Language

Save Changes

Related