The Engine Tab controls the core AI processing behavior of the agent. This includes the primary LLM settings, fallback model configuration for resilience, and performance settings that affect how quickly and reliably the agent responds during a call.
LLM Settings
Primary Model
The LLM used to generate agent responses for every turn. Select from the available models in the dropdown. Model options vary by your plan. Higher-capability models produce better responses but may have higher latency.
| Model Tier | Trade-off |
|---|
| Fast models | Lower latency, suitable for most conversational tasks |
| Standard models | Balanced latency and capability |
| Advanced models | Highest capability for complex reasoning, higher latency |
Use a fast model for simple task agents (appointment booking, FAQ handling). Reserve advanced models for agents that require multi-step reasoning, tool chaining, or nuanced decision-making.
Temperature
Controls the randomness of LLM outputs. Range: 0.0 to 1.0.
| Value | Effect |
|---|
0.0 | Fully deterministic. The model always produces the most likely response. Least creative. |
0.1 - 0.3 | Low randomness. Good for task agents that must follow instructions precisely. |
0.5 - 0.7 | Moderate randomness. Good for conversational agents that need natural variation. |
0.8 - 1.0 | High randomness. May produce unexpected or off-script responses. Not recommended for production agents. |
High temperature values can cause the agent to deviate from its instructions, generate inconsistent responses, or trigger the wrong functions. Use temperature above 0.7 only after testing extensively with simulation testing.
Fallback Model
Fallback Model
A secondary LLM used when the primary model is unavailable (e.g., provider outage, rate limit) or exceeds the fallback delay threshold. The fallback model processes the response instead of failing the turn.
Select a model that is reliable and fast. The fallback is a resilience mechanism, not an upgrade path. Typically choose a model that is simpler and faster than the primary.
Fallback Delay
The number of milliseconds to wait for the primary model before switching to the fallback model. If the primary model does not respond within this window, the fallback model processes the turn.
| Setting | Effect |
|---|
| Low (500ms) | Switches to fallback quickly. Reduces caller-perceived latency on slow primary model responses, but may unnecessarily use the fallback on momentary primary delays. |
| Medium (1000-2000ms) | Balanced. Gives the primary model reasonable time before failing over. |
| High (3000ms+) | Rarely uses fallback. Caller waits longer before a fallback fires. |
If no fallback model is configured, a turn that times out on the primary model results in a silent pause on the call. Configuring a fallback model is strongly recommended for production agents.
Response Behavior
Response Eagerness
Controls how quickly the agent begins generating a response after the caller’s last word. A higher eagerness value means the agent starts responding sooner, even if the caller might still be speaking.
| Value | Behavior |
|---|
| Low | Agent waits longer to confirm the caller has finished speaking. Reduces interruptions. May feel slow. |
| Medium | Balanced. Suitable for most conversational agents. |
| High | Agent begins responding quickly. Better for callers who speak in short, direct phrases. May occasionally interrupt callers who pause mid-sentence. |
Tune response eagerness in conjunction with the silence timeout and barge-in settings. Test the result with audio testing to find the right feel for your use case.
Predictive Preprocessing
Predictive preprocessing pre-generates likely agent responses before the caller finishes speaking, using partial transcription. When enabled:
- As the caller speaks, DialNexa transcribes partial utterances in real time.
- The LLM pre-generates candidate responses based on the partial transcript.
- When the caller finishes speaking and the final transcript is confirmed, the best matching pre-generated response is used immediately, or a fresh response is generated if no candidate matches.
This reduces the gap between caller turn end and agent response start.
| Setting | When to Enable |
|---|
| Enabled | High call volume agents where latency reduction matters, and callers ask predictable questions |
| Disabled | Agents where caller inputs are highly variable and pre-generation rarely matches, or where the prompt is complex and pre-generation accuracy is low |
Predictive preprocessing increases LLM token consumption because candidate responses are generated that may not be used. Factor this into cost estimates for high-volume deployments.
Transcriber Settings
Transcriber Model
The ASR (Automatic Speech Recognition) model used to transcribe caller audio to text. The available models depend on the agent’s configured language.
| Model Type | Trade-off |
|---|
| Standard | Fast transcription, suitable for clear audio |
| Enhanced | Higher accuracy for accented speech, noisy environments, or domain-specific terminology |
Transcriber Language
Inherited from the agent language set in the Agent Tab. Displayed here for reference. To change it, update the language in the Agent Tab.
Save Changes
Click Save to save to the current draft. Changes take effect when the draft is published as a new version.