Skip to main content
A/B testing lets you run two published agent versions simultaneously on the same phone number or route, splitting traffic between them. Use this to compare prompt variations, voice models, LLM choices, or structural flow changes with real call data before committing to one version.

What You Can Learn

A/B testing produces a side-by-side comparison across:
  • Call outcomes: Completion rate, call duration, transfer rate, hang-up timing
  • Post-call analysis fields: Sentiment scores, goal completion flags, custom extraction fields
  • Transcript quality: Language naturalness, error handling, off-script recovery
  • Model behavior: How different LLMs or temperatures handle edge cases and ambiguous inputs
Run A/B tests on changes that are hard to evaluate in a sandbox. Prompt rewrites, voice changes, and model upgrades all benefit from live traffic comparison.

Prerequisites

  • Two published agent versions. These can differ by prompt, LLM, voice, or any other configuration.
  • A phone number or route to apply the split to.
  • Enough inbound or outbound call volume to reach statistical significance. Low-volume routes may need days or weeks to produce reliable results.

Set Up a Traffic Split

1

Open the phone number or route

Navigate to Phone Numbers and click the number you want to test on. For outbound routes, open the relevant workflow or campaign.
2

Enable A/B Testing

In the number or route detail view, find the A/B Testing section and toggle it on.
3

Select Version A and Version B

Choose the two published agent versions to compare. Version A is typically your current production version; Version B is the variant you are testing.
4

Set the traffic split

Set the percentage of traffic each version receives. A 50/50 split produces the fastest results. If you want to limit exposure to a new version, start with 10/90 or 20/80 in favor of the stable version.
Traffic is split per call at random. Individual callers are not “sticky” to a version across calls unless you implement caller ID-based routing in your flow.
5

Save and activate

Click Save. The split takes effect on the next call to that number. Both versions run simultaneously from this point.

Read Results

Open the A/B test results view from the number or route detail page. Results update in real time as calls complete.

Metrics Table

MetricDescription
Calls (A / B)Total calls handled by each version
Avg DurationMean call duration per version
Completion RatePercentage of calls that ended with a defined success outcome (based on post-call analysis fields)
Transfer RatePercentage of calls that triggered a transfer action
Avg SentimentMean sentiment score from post-call analysis (if sentiment is enabled)
Custom FieldsAny post-call extraction fields you have defined, compared as averages or distributions

Transcript Sampling

The results view shows a random sample of transcripts from each version side by side. Review these manually to assess qualitative differences in language, error handling, and caller experience.
Statistical significance is not automatically calculated. Use a standard two-proportion z-test or chi-squared test on completion rates once each version has at least 100 calls.

End the Test

When you have enough data to make a decision:
1

Review results

Confirm that one version outperforms the other on your key metric (completion rate, sentiment, or another post-call analysis field).
2

Disable A/B testing

Open the A/B Testing section and toggle it off. You will be prompted to select which version to keep as the active version.
3

Select the winner

Choose the winning version. All traffic routes to that version immediately. The losing version remains published but is no longer assigned to this route.
If you disable A/B testing without selecting a winner, the system defaults to Version A. Confirm which version is selected before saving.

Limitations

  • A/B testing is available on one traffic split per phone number at a time. You cannot run a three-way split.
  • Both versions must be published. Draft versions cannot be included in a split.
  • Outbound batch campaigns do not support per-call A/B splits through the UI. For batch A/B testing, create two separate batches with different agent versions and compare results manually.