Skip to main content
The LLM Tab provides detailed control over how the language model processes each turn of the conversation. It overlaps with the Engine Tab for model and temperature settings but focuses specifically on context management, token limits, and which tools the LLM can access.
The LLM Tab and Engine Tab share some settings (model selection, temperature). Changes made in either tab affect the same underlying configuration. Check both tabs when auditing an agent’s LLM configuration.

Model Settings

Model

The LLM used for all agent response generation. This is the same setting as Primary Model in the Engine Tab. Changing it here updates the Engine Tab as well. See Engine Tab for guidance on selecting the right model for your use case.

Temperature

Controls response randomness. Same as the temperature setting in the Engine Tab. Range: 0.0 to 1.0. See Engine Tab for the full explanation.

Context Window Behavior

Context Window Size

The number of recent conversation turns included in the LLM prompt at each step. Including more turns gives the model more context to work with, which improves coherence in long conversations. Including too many turns increases token consumption and latency.
SettingEffect
Small (last 5 turns)Low token consumption, low latency. Suitable for short, transactional calls.
Medium (last 10-15 turns)Balanced. Good for most conversational agents.
Large (last 20+ turns or full history)High token consumption. Suitable for complex, long-running support calls where full history is needed.
Most LLMs have a maximum context window (e.g., 128k tokens). DialNexa automatically truncates the oldest turns if the conversation exceeds the model’s context limit. Configure the context window size to stay well within this limit for your expected call length and prompt size.

System Prompt Placement

Controls where in the LLM prompt the system instructions appear relative to the conversation history:
  • Top: System prompt appears before the conversation history (standard for most models).
  • Bottom: System prompt appears after the conversation history. Some models perform better with instructions at the end. Test with your specific model if you encounter instruction-following issues.

Include Call Metadata in Context

When enabled, the LLM receives structured metadata about the call in the prompt context:
  • Call ID
  • Caller phone number (for inbound calls)
  • Call start time
  • Variables passed at call initiation
This lets the LLM reference call metadata in its responses (e.g., “I can see you’re calling from the number we have on file.”) without you needing to inject this data manually into the variables.

Tool and Function Access

Available Tools

The list of tools and functions the LLM can call during the conversation. This list is populated from what you have attached in the Tools Tab. Use the LLM Tab to control which tools are active in the LLM’s tool-use context.
ControlAction
Toggle on/offEnable or disable a specific tool for this agent version without removing it entirely. Disabled tools are not shown to the LLM and will not be called.
ReorderDrag tools to reorder them. Tool order may affect which tool the LLM selects when multiple tools are applicable. Place the most commonly used tools first.

Tool Call Mode

Controls how the LLM decides when to call tools:
ModeBehavior
AutoThe LLM decides when a tool call is appropriate based on the conversation. Standard mode.
RequiredThe LLM must call at least one tool on every turn. Use this for agents where every caller input requires a backend lookup.
NoneThe LLM cannot call any tools, even if tools are attached. Use for testing prompt behavior without tool calls.
Setting tool call mode to Required on an agent with multiple tools can cause the LLM to make unnecessary tool calls on every turn. Use this mode only when every turn genuinely requires a tool lookup.

Max Tool Calls Per Turn

The maximum number of sequential tool calls the LLM can make in a single turn before it must generate a response. Prevents runaway tool call chains. Default: 5. Lower this if tool calls are taking too long and causing excessive latency. Raise it if your use case requires chaining multiple lookups before responding.

Save Changes

Click Save to save to the current draft. Publish to apply to live calls.