Transcription Modes - DialNexa Documentation

Transcription mode controls how DialNexa converts the caller’s speech to text. The mode determines when the transcriber sends results to the LLM — which directly affects how quickly the agent responds after the caller finishes speaking. DialNexa uses Deepgram as its transcription provider. Different Deepgram models support different transcription modes.

The two modes

Streaming (word-by-word)

The transcriber emits partial results as words are recognized, before the caller has finished speaking. The system can begin processing earlier, potentially reducing time-to-first-byte of agent response.Best for: low-latency applications, short utterances, turn-based conversations

Endpoint-based

The transcriber waits for a speech endpoint (a detected pause or end of utterance) before emitting a complete result. The transcript is more accurate but arrives later.Best for: complex utterances, high-accuracy requirements, callers who speak in long sentences

How mode affects response latency

Response latency = transcription time + LLM processing time + TTS synthesis time + network Transcription mode affects the first component.

Streaming mode: the transcriber begins delivering text while the caller is still speaking. The LLM pipeline can start processing partial input. When combined with a low Response Eagerness setting, the agent can start responding very quickly after the caller stops. Total perceived latency is lower.
Endpoint-based mode: the complete transcript arrives after the endpoint is detected. The LLM starts processing only after the full utterance is available. This adds 200 to 800 ms of additional latency in typical usage, depending on the endpoint detection sensitivity.

Streaming mode can cause the agent to respond before the caller is fully done speaking. If Response Eagerness is set too high, the agent will interrupt callers mid-sentence. Tune Response Eagerness alongside transcription mode.

Deepgram models and mode support

Deepgram Model	Streaming	Endpoint-based	Notes
Nova-2	Yes	Yes	General purpose, high accuracy
Nova-2 (Medical)	No	Yes	Specialized vocabulary
Nova-2 (Phone Call)	Yes	Yes	Optimized for telephone audio
Whisper (via Deepgram)	No	Yes	Highest accuracy, higher latency
Base	Yes	Yes	Lower cost, lower accuracy

For most phone call deployments, use Nova-2 (Phone Call) in streaming mode. It is tuned for the audio characteristics of telephone calls (8 kHz audio, compression artifacts, speaker overlap) and supports streaming for low latency.

Choosing between modes

Use streaming mode when:

Perceived response latency is a top priority
Callers speak in short, clear phrases (command-style input)
Your agents handle simple, transactional intents where partial transcripts are sufficient
You have tuned Response Eagerness to prevent premature interruptions

Use endpoint-based mode when:

Callers speak in long or complex sentences that benefit from full-context transcription
Transcription accuracy is more important than latency (e.g., medical, legal contexts)
Callers have accents or speech patterns that cause streaming partial transcripts to be unstable
You are using a specialized model (medical, financial) that only supports endpoint mode

Response Eagerness relationship

Response Eagerness is a separate setting that controls how aggressively the agent interrupts or begins responding. It interacts directly with transcription mode:

Streaming + High Eagerness: very fast responses, higher risk of interrupting callers
Streaming + Low Eagerness: faster than endpoint-based but agent waits for more stable partial transcripts
Endpoint-based + any Eagerness: agent always waits for the full transcript before considering a response

Set Response Eagerness in your agent’s Speech Settings after choosing the transcription mode.

Configuring transcription mode

Transcription mode is set per agent in Settings > Speech > Transcription.

Open Speech Settings

Navigate to your agent, then go to Settings > Speech.

Select a transcription model

Choose the Deepgram model from the Transcription Model dropdown. The available modes for that model are shown automatically.

Select the mode

Under Transcription Mode, choose Streaming or Endpoint-based.

Adjust Response Eagerness

If you selected streaming mode, tune Response Eagerness to match your expected caller behavior. Start at the default and lower it if the agent is interrupting callers.

​The two modes

Streaming (word-by-word)

Endpoint-based

​How mode affects response latency

​Deepgram models and mode support

​Choosing between modes

​Response Eagerness relationship

​Configuring transcription mode

​Related

The two modes

How mode affects response latency

Deepgram models and mode support

Choosing between modes

Response Eagerness relationship

Configuring transcription mode

Related