Where transcription errors come from
Before applying fixes, identify the source of the error. The same symptom (agent responding incorrectly) can come from different causes:| Symptom | Source |
|---|---|
| Agent uses wrong word that sounds like what the caller said | Transcription error (acoustic confusion) |
| Agent uses wrong word with no phonetic similarity | LLM hallucination or prompt issue |
| Specific domain terms are consistently wrong | Missing vocabulary hints |
| Errors only in noisy calls | Background noise, insufficient denoising |
| Errors only for specific callers | Accent not well-supported by current model |
| Numbers/dates extracted incorrectly | Text normalization issue, not transcription |
Vocabulary hints
Vocabulary hints are custom words and phrases that tell the transcriber to expect specific terms. The transcriber uses hints to bias recognition toward these terms when the audio is ambiguous. Use vocabulary hints for:- Product names, brand names, proprietary terms that do not appear in general language models (e.g., “DialNexa”, “Nexabot”, “Vexa Pro”)
- Medical, legal, or technical terminology specific to your domain
- Names of people, places, or services that are commonly mispronounced by the transcriber
- Short alphanumeric codes or identifiers that transcribers often read as separate words
- Hints bias recognition, they do not guarantee a specific word is used. If the caller pronounces a term very differently from the expected pronunciation, hints may not help.
- Deepgram’s vocabulary hint support varies by model. Verify that the model you are using accepts custom vocabulary.
Transcription model selection
Different Deepgram models have different accuracy profiles. Selecting the right model for your use case is often the highest-impact change you can make.| Model | Best for |
|---|---|
| Nova-2 | General purpose, broad language coverage |
| Nova-2 (Phone Call) | Telephone audio quality (8 kHz, compression artifacts) |
| Nova-2 (Medical) | Medical terminology, clinical conversations |
| Whisper (via Deepgram) | Maximum accuracy, accented speech, high latency |
- Your callers have diverse accents and Nova-2 accuracy is insufficient
- Response latency is less critical than transcript accuracy (Whisper is slower)
- You have a high-value use case where accuracy matters more than throughput
Background noise and denoising
Background noise increases word error rates. Server-side denoising (Denoising Mode in Speech Settings) reduces noise before the audio reaches the transcriber. For the denoising configuration guide, see Handle Background Noise. Key principle: apply only as much denoising as needed. High denoising on clean audio or accented speech can attenuate phonemes alongside noise, making transcription worse rather than better.Speaker diarization
Speaker diarization labels which portions of the transcript belong to which speaker (caller vs. agent). In DialNexa, diarization is most relevant for:- Post-call analysis that needs to distinguish caller statements from agent statements
- Transcription cleanup where the caller and agent speak over each other
speaker field (caller or agent) for each segment.
Diarization is a compute-intensive feature and adds latency to transcript delivery. Enable it only when post-call processing requires caller/agent separation.
Transcript cleanup post-processing
For use cases where transcript accuracy is critical for downstream analysis (medical, legal, compliance), apply post-processing to the raw transcript before using it. Post-processing approaches:- Acronym expansion: replace “AI” with “artificial intelligence”, “OTP” with “one-time password” based on domain-specific rules
- Number normalization: standardize how dates and numbers appear in the transcript for consistent PCA extraction
- Filler word removal: strip “um”, “uh”, “you know” from the caller transcript before LLM processing
call.transcript_ready event and runs cleanup before storing or analyzing the transcript. The cleaned transcript can then be passed to your own LLM analysis pipeline.
Identifying transcription errors vs. LLM errors in Session History
Open the session in Session History
Go to Monitor > Sessions, find the call, and open the event timeline.
Locate the problematic turn
Find the agent response that was incorrect or surprising. Identify which caller utterance preceded it.
Check the transcription event for that utterance
The transcription event shows the exact text the transcriber produced. Is the text correct? If the caller said “I want to book an appointment” but the transcription shows “I want to book a department”, that is a transcription error.
If transcription is correct, check the LLM event
If the transcription event shows the correct caller text, open the LLM call event for the agent’s response. Examine the full input context. Did the LLM receive the correct conversation history? Was the prompt intact?If the LLM input was correct but the output was wrong, the issue is in the prompt or LLM behavior, not the transcriber.
Common transcription errors and fixes
| Error pattern | Fix |
|---|---|
| Domain-specific terms consistently wrong | Add to vocabulary hints |
| Numbers read as words instead of digits | Adjust text normalization settings |
| First word of each utterance frequently wrong | Check endpoint detection sensitivity; caller may be cutting off the beginning of speech |
| All errors in calls from a specific region | Switch to Nova-2 (Phone Call) or Whisper for better accent coverage |
| Errors only in calls with background noise | Increase Denoising Mode from Low to High; test for accent impact |
| Agent name or product name consistently misspelled | Add to vocabulary hints with correct spelling |