Agent and Conversation Architecture
Agent An AI phone agent is the core entity in DialNexa. An agent has a configured identity (system prompt, voice, LLM, tools) and is attached to one or more phone numbers or web call endpoints. When a call is connected to an agent, it handles the conversation autonomously. Audio Cache A feature that pre-generates TTS audio for static phrases and stores it for instant playback. When the agent needs to say a cached phrase, the pre-generated audio is used instead of calling the TTS API, reducing latency and TTS costs. See Latency. Batch Call / Batch Campaign A batch campaign initiates a large number of outbound calls from a contact list, typically for surveys, notifications, or outbound sales. Each call in the batch runs the configured agent. Campaigns support configurable concurrency, retry logic, and scheduling. Concurrency The number of simultaneous active calls an account can handle at any moment. Exceeding the concurrency limit causes new call attempts to be queued or rejected. Concurrency limits are plan-dependent. See Concurrency Tiers. Conversation Flow A node-based visual architecture for building agents. The conversation is modeled as a graph of nodes (states) connected by transitions (edges with conditions). The agent navigates from node to node based on what the caller says. See the Conversation Flow documentation. Dynamic Variable A placeholder in a prompt or agent configuration that is replaced with a real value at call time. Syntax:{{variable_name}}. For example, {{caller_name}} is replaced with the caller’s name when the call starts, if provided via the API or a lookup.
End Node
A special node in a conversation flow that ends the call when reached. Every complete conversation flow must have at least one End Node.
Global Node
A node in a conversation flow that is active throughout the entire conversation, regardless of which other node the agent is currently on. Used for universal behaviors such as “if the caller asks to speak to a human at any point, transfer them.”
Multi-Prompt Architecture
An agent architecture where different stages of the conversation use different system prompts. For example, an agent might use a greeting prompt, a qualification prompt, and a booking prompt as separate stages. This provides more control than a single prompt for complex flows.
Node
A discrete state in a conversation flow. Each node has a prompt that guides the agent’s behavior while in that state. The agent remains on a node until a transition condition is satisfied.
Post-Call Analysis
A feature that automatically extracts structured data from call transcripts after the call ends. You define extraction fields (e.g., “Did the caller book an appointment?” as a boolean), and DialNexa uses an LLM to populate those fields from the transcript. Results are available via the dashboard and API.
Single-Prompt Architecture
The simplest agent architecture: one system prompt governs the entire conversation. Best for straightforward, linear interactions.
Transition Condition
In a conversation flow, the rule that determines when the agent moves from one node to another. Conditions are evaluated by the LLM based on the conversation context. For example: “The caller has confirmed their date and time” triggers the transition from a collection node to a confirmation node.
Workspace
A logical organizational unit within DialNexa. Each workspace has its own agents, phone numbers, call history, and settings. A user can belong to multiple workspaces. Teams often use separate workspaces for different products, clients, or environments (development vs. production). See Workspace.
Voice and Audio
Cartesia One of the TTS (text-to-speech) providers available in DialNexa. Known for very low synthesis latency and natural-sounding voices. Model:sonic-2. Recommended when latency is the primary optimization target.
ElevenLabs
One of the TTS providers available in DialNexa. Known for highly expressive, human-like voice quality. Default model: eleven_flash_v2_5. Recommended when voice quality and naturalness are the primary priority.
SmallestAI
An India-focused TTS provider available in DialNexa. Optimised for Indian languages with a broad selection of Indian voice personas. Models: lightning, lightning-large, lightning-v2. Best for agents targeting Indian callers.
Sarvam AI
An India-focused TTS provider available in DialNexa. Strong support for Indian English (en-IN) and regional Indian languages. Model: bulbul:v2. Recommended for Indian-market agents and Indian English callers.
TTS (Text-to-Speech)
The process of converting the agent’s text response into spoken audio. DialNexa uses streaming TTS, meaning audio playback begins before the full response is synthesized, reducing perceived latency.
Voice Activity Detection (VAD)
The algorithm that determines when the caller has stopped speaking. VAD triggers the end-of-speech signal, which starts the transcription process. The VAD sensitivity setting controls how long a pause must be before speech is considered ended.
Language Models
DeepSeek V3 DeepSeek’s cost-optimised LLM available in DialNexa. Strong instruction-following at low cost. Good for high-volume deployments where cost is a priority. GPT-4o OpenAI’s full-capability multimodal model. More capable than GPT-4o Mini for complex reasoning, but with higher latency. Use when accuracy is more important than speed. GPT-4o Mini OpenAI’s fast, lightweight model and the default LLM in DialNexa. Balances speed and quality well for most voice agent use cases. Recommended as the starting point for all agents. Llama 4 (Groq) Meta’s Llama 4 model running on Groq’s inference hardware, available in DialNexa. Extremely fast token generation with low cost per token. Best for high-volume, latency-sensitive deployments with simple, well-tested prompts. LLM (Large Language Model) The AI model that processes the conversation context and generates the agent’s responses. DialNexa supports models from OpenAI (GPT-4o Mini, GPT-4o, and GPT-4.1 family), Groq (Llama 4), DeepSeek (DeepSeek V3), and Google (Gemini). The full list is dynamic and can be retrieved viaGET /v1/llms.
Temperature
A parameter that controls the randomness of LLM output. Lower values (0.0-0.3) produce consistent, focused responses. Higher values (0.7-1.0) produce more varied, creative responses. Default is typically 0.5-0.7 for voice agents.
Context Window
The maximum number of tokens (roughly words) the LLM can process in a single request, including the system prompt and conversation history. If a conversation exceeds the context window, older turns are truncated.
Transcription
Confidence Score A score (0.0-1.0) assigned by the transcriber to each transcribed utterance, indicating how confident the model is in its transcription. Scores below 0.7 suggest the transcription may be inaccurate. Deepgram Flux The Deepgram transcription option available in the DialNexa dashboard (shown as “Deepgram Flux (English only)”). Uses theflux-general-en model for low-latency English streaming transcription. The underlying Deepgram engine also supports nova-2 and nova-3 models, but Flux is the selectable option in the UI. Not suitable for non-English calls.
Soniox
A transcription provider available in DialNexa. Well-suited for Indian English and South Asian accents. Recommended when Deepgram models produce high error rates for your caller base.
STT (Speech-to-Text)
Synonymous with transcription. The process of converting the caller’s spoken audio into text that the LLM can process.
Transcriber
The component responsible for converting caller speech to text. The current dashboard selector supports Deepgram Flux for English and Soniox for multilingual or mixed-language calls.
Telephony
DID (Direct Inward Dial) A telephone number that routes directly to a specific destination (in this case, a DialNexa agent) without going through a switchboard or IVR. Standard phone numbers used in DialNexa are DIDs. DTMF (Dual-Tone Multi-Frequency) The tones generated when a caller presses keys on their phone keypad. DialNexa agents can be configured to recognize and respond to DTMF input, useful for menus where the caller presses a number to select an option. E.164 The international standard format for phone numbers:+[country code][number], e.g., +919876543210. All phone numbers in DialNexa must be in E.164 format.
Inbound Call
A call initiated by an external caller to one of your DialNexa phone numbers. The agent answers and handles the conversation.
Outbound Call
A call initiated by your DialNexa agent (or via the API) to an external phone number. Used for follow-ups, notifications, surveys, and sales.
PSTN (Public Switched Telephone Network)
The traditional telephone network. DialNexa connects to the PSTN via telephony providers (Plivo, or a BYO SIP trunk) to make and receive real phone calls.
SIP (Session Initiation Protocol)
A signaling protocol used for voice and video calls over IP networks. DialNexa supports SIP trunk integration for enterprise telephony setups.
Telephony Provider
The underlying carrier that DialNexa uses to connect to the PSTN. The default provider is Plivo. You can also bring your own carrier via a BYO SIP trunk.
Transfer Call
A built-in DialNexa tool that transfers an active call to another phone number (e.g., a human agent). The call leaves the AI agent and connects to the transfer destination.
Voicemail Detection
DialNexa’s ability to detect when an outbound call connects to a voicemail system rather than a live person. When voicemail is detected, the call can be configured to leave a message, hang up, or mark for retry.
Webhooks and Integration
Webhook An HTTP endpoint that receives real-time event notifications from DialNexa. Used to receive call lifecycle events (call.started, call.ended, call.failed) or to implement custom tool logic.
Custom Function
A tool type in DialNexa that calls a developer-supplied webhook URL during a conversation. The LLM can invoke it when needed (based on the tool description), passing arguments as a JSON payload. The webhook response is returned to the LLM.
HMAC-SHA256
A cryptographic signature algorithm used to sign DialNexa webhook payloads. Verify this signature on all incoming webhooks to ensure they are genuine. See Prevent Abuse.