Creating an alert
Name the alert
Enter a descriptive name. Good alert names include the metric and the condition: “Failed Call Rate > 15% (Production Agent)” rather than “Alert 1”.
Select the metric
Choose the metric to monitor. Available metrics:
| Metric | Description |
|---|---|
| Failed call rate | Percentage of calls that ended without the expected outcome |
| Average call duration | Mean call length in seconds |
| Call volume | Number of calls in the window |
| Total cost | Accumulated TTS + LLM + transcription cost |
| LLM error rate | Percentage of LLM calls that returned an error |
| Function error rate | Percentage of function/tool calls that returned an error |
Set the threshold and comparison
Define when the alert fires. Examples:
- Failed call rate greater than 15%
- Average call duration greater than 300 seconds
- Total cost greater than $50
- Call volume less than 10 (to detect when calls unexpectedly stop)
Set the evaluation window
Choose how much data the alert considers when evaluating. Options: last 15 minutes, last 1 hour, last 6 hours, last 24 hours. Shorter windows catch problems faster but produce more false positives from small sample sizes. Use 1-hour windows for most operational alerts.
Scope to an agent or number (optional)
By default, the alert evaluates across the entire workspace. Narrow it to a specific agent or inbound phone number if you want per-agent alerting.
Configure delivery
Choose how you want to receive the notification. Multiple delivery methods can be selected for a single alert.
Delivery methods
Email Enter one or more email addresses. When the alert fires, each address receives an email containing the alert name, the metric value that triggered it, the threshold, and a link to the Analytics Dashboard filtered to the relevant time range. Webhook Enter a webhook URL. When the alert fires, DialNexa sends a POST request to the URL with a JSON payload:Alert history
Monitor > Alerts > History shows a log of every alert trigger event, including the metric value at the time of trigger, the delivery status (email sent, webhook delivered), and whether the delivery succeeded. If a webhook delivery fails (non-2xx response or timeout), DialNexa retries up to 3 times with exponential backoff. Failed deliveries appear in the alert history with the HTTP status code returned.Recommended alerts
Failed call rate spike
Failed call rate spike
Metric: Failed call rate
Threshold: Greater than 15%
Window: 1 hour
Why: A sudden increase in failed calls usually indicates a broken function/tool call, a prompt regression, or an upstream service outage. 15% is a reasonable threshold for most production agents — adjust based on your baseline.
Threshold: Greater than 15%
Window: 1 hour
Why: A sudden increase in failed calls usually indicates a broken function/tool call, a prompt regression, or an upstream service outage. 15% is a reasonable threshold for most production agents — adjust based on your baseline.
Average duration spike
Average duration spike
Metric: Average call duration
Threshold: Greater than 300 seconds (or 1.5x your baseline)
Window: 1 hour
Why: Long calls often indicate the agent is looping, confused, or waiting on a slow function call. A duration spike with stable volume points to a behavioral regression.
Threshold: Greater than 300 seconds (or 1.5x your baseline)
Window: 1 hour
Why: Long calls often indicate the agent is looping, confused, or waiting on a slow function call. A duration spike with stable volume points to a behavioral regression.
Daily cost threshold
Daily cost threshold
Metric: Total cost
Threshold: Greater than your daily budget target
Window: 24 hours
Why: Prevents unexpected billing surprises. Set the threshold at 80-90% of your daily target to give yourself time to respond before hitting the hard limit.
Threshold: Greater than your daily budget target
Window: 24 hours
Why: Prevents unexpected billing surprises. Set the threshold at 80-90% of your daily target to give yourself time to respond before hitting the hard limit.
Volume drop (dead agent detection)
Volume drop (dead agent detection)
Metric: Call volume
Threshold: Less than 5 calls
Window: 1 hour (during expected operating hours)
Why: If your agent normally receives 50+ calls per hour and volume drops near zero, something has broken upstream — a phone number issue, a routing misconfiguration, or a workspace error. This alert catches silent failures that do not show up in error rate metrics.
Threshold: Less than 5 calls
Window: 1 hour (during expected operating hours)
Why: If your agent normally receives 50+ calls per hour and volume drops near zero, something has broken upstream — a phone number issue, a routing misconfiguration, or a workspace error. This alert catches silent failures that do not show up in error rate metrics.
Function error rate
Function error rate
Metric: Function error rate
Threshold: Greater than 10%
Window: 30 minutes
Why: High function error rates indicate your integration endpoints are down, returning unexpected responses, or are being rate-limited. Catching this early prevents cascading call failures.
Threshold: Greater than 10%
Window: 30 minutes
Why: High function error rates indicate your integration endpoints are down, returning unexpected responses, or are being rate-limited. Catching this early prevents cascading call failures.