What Simulation Testing Covers
- Multi-turn conversation flow
- Agent response content at specific turns (keyword or phrase matching)
- Function call invocation (did the agent call the expected function?)
- Function call arguments (did the agent pass the correct parameters?)
- Conversation end state (did the call end as expected?)
- Variable handling (are variables substituted correctly in responses?)
What It Does Not Cover
- Voice quality or pronunciation
- Real transcription accuracy (caller inputs are injected as text, bypassing ASR)
- Telephony carrier behavior
- Real-time latency
Create a Simulation
Click New Simulation
Click New Simulation. Give it a descriptive name (e.g.,
Happy path - appointment booking, Edge case - caller refuses to provide name).Set variables
If your agent prompt uses variables, define their values for this simulation scenario. Click Variables and enter the key-value pairs.
Define the caller script
Add turns to the simulation. Each turn has:
- Caller Input: The text the caller “says” at this turn. This is injected directly into the LLM as if it were a transcribed utterance.
- Expected Behaviors: One or more checks to run against the agent’s response for this turn.
Configure expected behaviors
For each turn, you can define one or more expected behaviors:
| Behavior Type | What It Checks |
|---|---|
| Contains text | Agent response includes a specific word or phrase |
| Does not contain text | Agent response does not include a specific word or phrase |
| Calls function | Agent triggered a specific function call |
| Function argument equals | A specific function argument matches an expected value |
| Conversation ends | The agent ended the call at this turn |
| Custom regex | Agent response matches a regular expression pattern |
Run a Simulation
Select the agent version
Choose the agent version to test against: the current draft, or a specific published version.
Read Simulation Results
After the run completes, the results view shows:- Overall status: Pass (all checks passed) or Fail (one or more checks failed).
- Turn-by-turn breakdown: Each caller turn is listed with:
- The agent’s full response text
- The result of each expected behavior check (pass or fail)
- For failures: what was expected vs. what was found
- Any function calls made during that turn
Example Turn Result
Debugging a Failure
When a turn fails:- Click the failed turn to expand the full agent response.
- Review the function call arguments if a function was triggered.
- Adjust the prompt or expected behavior definition, then re-run the simulation.
Regression Testing After Prompt Changes
Set up a suite of simulations covering your core call scenarios and edge cases. Run the full suite after every prompt change or agent configuration update:- Open the Simulations tab.
- Click Run All.
- Review the overall pass rate across all simulations.
- Investigate any newly failing simulations to understand the impact of your change.
Limitations
| Limitation | Details |
|---|---|
| Text input only | Caller inputs are text. ASR transcription errors that occur on real calls are not simulated. |
| No audio | Voice quality and pronunciation issues are not detectable. |
| Non-deterministic LLM | Even with low temperature, the LLM may vary its exact wording. Write behavior checks to match intent rather than exact phrasing. |
| No telephony | Carrier-level events (no answer, busy, connection failure) cannot be simulated. |