Skip to main content
Transcription accuracy directly affects agent behavior. A word error in the transcript propagates to the LLM, which may misinterpret the caller’s intent, extract the wrong data, or produce an incorrect response. This page covers techniques for improving transcript accuracy at every point in the pipeline.

Where transcription errors come from

Before applying fixes, identify the source of the error. The same symptom (agent responding incorrectly) can come from different causes:
SymptomSource
Agent uses wrong word that sounds like what the caller saidTranscription error (acoustic confusion)
Agent uses wrong word with no phonetic similarityLLM hallucination or prompt issue
Specific domain terms are consistently wrongMissing vocabulary hints
Errors only in noisy callsBackground noise, insufficient denoising
Errors only for specific callersAccent not well-supported by current model
Numbers/dates extracted incorrectlyText normalization issue, not transcription
Check the Session History LLM call event to see the exact text that reached the LLM. If the transcription event shows correct text but the LLM output is wrong, the problem is in the LLM layer, not the transcriber.

Vocabulary hints

Vocabulary hints are custom words and phrases that tell the transcriber to expect specific terms. The transcriber uses hints to bias recognition toward these terms when the audio is ambiguous. Use vocabulary hints for:
  • Product names, brand names, proprietary terms that do not appear in general language models (e.g., “DialNexa”, “Nexabot”, “Vexa Pro”)
  • Medical, legal, or technical terminology specific to your domain
  • Names of people, places, or services that are commonly mispronounced by the transcriber
  • Short alphanumeric codes or identifiers that transcribers often read as separate words
Configuring vocabulary hints: Navigate to Settings > Speech > Vocabulary Hints. Add each term on a new line. For multi-word phrases, enter the full phrase. You can include phonetic spellings in parentheses for unusual terms.
DialNexa
Nexabot
appointment ID
haematology
Kovalam Beach
Keep your vocabulary hint list focused. Adding hundreds of common words does not help — the transcriber already handles common vocabulary well. Prioritize domain-specific terms with the highest call volume impact.
Limitations of vocabulary hints:
  • Hints bias recognition, they do not guarantee a specific word is used. If the caller pronounces a term very differently from the expected pronunciation, hints may not help.
  • Deepgram’s vocabulary hint support varies by model. Verify that the model you are using accepts custom vocabulary.

Transcription model selection

Different Deepgram models have different accuracy profiles. Selecting the right model for your use case is often the highest-impact change you can make.
ModelBest for
Nova-2General purpose, broad language coverage
Nova-2 (Phone Call)Telephone audio quality (8 kHz, compression artifacts)
Nova-2 (Medical)Medical terminology, clinical conversations
Whisper (via Deepgram)Maximum accuracy, accented speech, high latency
For most DialNexa deployments, Nova-2 (Phone Call) is the right default. It is trained on telephony audio and handles the compression and bandwidth limitations of phone calls better than the general Nova-2 model. Switch to Whisper if:
  • Your callers have diverse accents and Nova-2 accuracy is insufficient
  • Response latency is less critical than transcript accuracy (Whisper is slower)
  • You have a high-value use case where accuracy matters more than throughput

Background noise and denoising

Background noise increases word error rates. Server-side denoising (Denoising Mode in Speech Settings) reduces noise before the audio reaches the transcriber. For the denoising configuration guide, see Handle Background Noise. Key principle: apply only as much denoising as needed. High denoising on clean audio or accented speech can attenuate phonemes alongside noise, making transcription worse rather than better.

Speaker diarization

Speaker diarization labels which portions of the transcript belong to which speaker (caller vs. agent). In DialNexa, diarization is most relevant for:
  • Post-call analysis that needs to distinguish caller statements from agent statements
  • Transcription cleanup where the caller and agent speak over each other
Enable diarization in Settings > Speech > Diarization (if available for your model). When enabled, transcript events include a speaker field (caller or agent) for each segment.
Diarization is a compute-intensive feature and adds latency to transcript delivery. Enable it only when post-call processing requires caller/agent separation.

Transcript cleanup post-processing

For use cases where transcript accuracy is critical for downstream analysis (medical, legal, compliance), apply post-processing to the raw transcript before using it. Post-processing approaches:
  • Acronym expansion: replace “AI” with “artificial intelligence”, “OTP” with “one-time password” based on domain-specific rules
  • Number normalization: standardize how dates and numbers appear in the transcript for consistent PCA extraction
  • Filler word removal: strip “um”, “uh”, “you know” from the caller transcript before LLM processing
Implement post-processing in a webhook handler that receives the call.transcript_ready event and runs cleanup before storing or analyzing the transcript. The cleaned transcript can then be passed to your own LLM analysis pipeline.

Identifying transcription errors vs. LLM errors in Session History

1

Open the session in Session History

Go to Monitor > Sessions, find the call, and open the event timeline.
2

Locate the problematic turn

Find the agent response that was incorrect or surprising. Identify which caller utterance preceded it.
3

Check the transcription event for that utterance

The transcription event shows the exact text the transcriber produced. Is the text correct? If the caller said “I want to book an appointment” but the transcription shows “I want to book a department”, that is a transcription error.
4

If transcription is correct, check the LLM event

If the transcription event shows the correct caller text, open the LLM call event for the agent’s response. Examine the full input context. Did the LLM receive the correct conversation history? Was the prompt intact?If the LLM input was correct but the output was wrong, the issue is in the prompt or LLM behavior, not the transcriber.
5

Apply the right fix

  • Transcription error: adjust denoising, switch transcription model, add vocabulary hints
  • LLM error: revise system prompt, add constraints, check for context length issues

Common transcription errors and fixes

Error patternFix
Domain-specific terms consistently wrongAdd to vocabulary hints
Numbers read as words instead of digitsAdjust text normalization settings
First word of each utterance frequently wrongCheck endpoint detection sensitivity; caller may be cutting off the beginning of speech
All errors in calls from a specific regionSwitch to Nova-2 (Phone Call) or Whisper for better accent coverage
Errors only in calls with background noiseIncrease Denoising Mode from Low to High; test for accent impact
Agent name or product name consistently misspelledAdd to vocabulary hints with correct spelling