Skip to main content
Simulation testing runs a scripted call scenario through the agent without placing a real phone call. You define a sequence of caller inputs and the expected agent behaviors at each turn. The simulation engine executes the scenario automatically and reports pass or fail for each verification point. Use simulation testing for regression testing after prompt changes, version comparisons, and automated quality gates in CI/CD pipelines.

What Simulation Testing Covers

  • Multi-turn conversation flow
  • Agent response content at specific turns (keyword or phrase matching)
  • Function call invocation (did the agent call the expected function?)
  • Function call arguments (did the agent pass the correct parameters?)
  • Conversation end state (did the call end as expected?)
  • Variable handling (are variables substituted correctly in responses?)

What It Does Not Cover

  • Voice quality or pronunciation
  • Real transcription accuracy (caller inputs are injected as text, bypassing ASR)
  • Telephony carrier behavior
  • Real-time latency

Create a Simulation

1

Open Simulations

Navigate to your agent and click the Simulations tab.
2

Click New Simulation

Click New Simulation. Give it a descriptive name (e.g., Happy path - appointment booking, Edge case - caller refuses to provide name).
3

Set variables

If your agent prompt uses variables, define their values for this simulation scenario. Click Variables and enter the key-value pairs.
4

Define the caller script

Add turns to the simulation. Each turn has:
  • Caller Input: The text the caller “says” at this turn. This is injected directly into the LLM as if it were a transcribed utterance.
  • Expected Behaviors: One or more checks to run against the agent’s response for this turn.
Add as many turns as needed to cover the full scenario.
5

Configure expected behaviors

For each turn, you can define one or more expected behaviors:
Behavior TypeWhat It Checks
Contains textAgent response includes a specific word or phrase
Does not contain textAgent response does not include a specific word or phrase
Calls functionAgent triggered a specific function call
Function argument equalsA specific function argument matches an expected value
Conversation endsThe agent ended the call at this turn
Custom regexAgent response matches a regular expression pattern
6

Save the simulation

Click Save. The simulation is stored and ready to run.

Run a Simulation

1

Select the simulation

In the Simulations tab, click the simulation you want to run.
2

Select the agent version

Choose the agent version to test against: the current draft, or a specific published version.
3

Click Run

Click Run. The simulation engine executes the caller script turn by turn. Each turn’s result appears in real time.

Read Simulation Results

After the run completes, the results view shows:
  • Overall status: Pass (all checks passed) or Fail (one or more checks failed).
  • Turn-by-turn breakdown: Each caller turn is listed with:
    • The agent’s full response text
    • The result of each expected behavior check (pass or fail)
    • For failures: what was expected vs. what was found
    • Any function calls made during that turn

Example Turn Result

Turn 2: "My name is Alex Rivera"
  Agent response: "Thanks, Alex! Let me pull up your account."
  
  Checks:
  [PASS] Contains text: "Alex"
  [PASS] Calls function: lookup_account
  [PASS] Function argument "name" equals "Alex Rivera"

Debugging a Failure

When a turn fails:
  1. Click the failed turn to expand the full agent response.
  2. Review the function call arguments if a function was triggered.
  3. Adjust the prompt or expected behavior definition, then re-run the simulation.
If a check fails intermittently across multiple runs of the same simulation, the LLM may be producing non-deterministic responses. Reduce temperature in the Engine Tab or make the expected behavior check less strict (use “contains” rather than exact match).

Regression Testing After Prompt Changes

Set up a suite of simulations covering your core call scenarios and edge cases. Run the full suite after every prompt change or agent configuration update:
  1. Open the Simulations tab.
  2. Click Run All.
  3. Review the overall pass rate across all simulations.
  4. Investigate any newly failing simulations to understand the impact of your change.
A regression is indicated when a simulation that previously passed now fails after a change. This helps you catch unintended side effects before publishing a new agent version.

Limitations

LimitationDetails
Text input onlyCaller inputs are text. ASR transcription errors that occur on real calls are not simulated.
No audioVoice quality and pronunciation issues are not detectable.
Non-deterministic LLMEven with low temperature, the LLM may vary its exact wording. Write behavior checks to match intent rather than exact phrasing.
No telephonyCarrier-level events (no answer, busy, connection failure) cannot be simulated.