Skip to main content
Fixing agent behavior is fundamentally a prompt and configuration problem. The agent does exactly what the LLM decides to do, given the system prompt, the conversation history, and the available tools. If the agent is doing something wrong, one of those three inputs is the cause. This guide gives you a systematic process for identifying which input is the problem and how to fix it.

Start with the Transcript

Before changing anything, read the transcript of a call where the behavior occurred. Specifically:
  1. What did the caller say immediately before the unwanted behavior?
  2. What did the agent say (or not say)?
  3. Was a tool invoked? What did it return?
  4. Is the behavior deterministic (happens every time) or occasional?
Deterministic issues usually point to a structural problem in the prompt or flow. Occasional issues suggest a temperature or model consistency problem.

Common Behavioral Problems and Fixes

The agent is not respecting constraints in the system prompt. This is the most common issue.Fixes:
  • Move the constraint closer to the top of the system prompt. LLMs weight earlier instructions more heavily.
  • Make the constraint explicit and negative: instead of “focus on appointments”, say “Do not discuss topics unrelated to appointment booking. If asked about anything else, politely redirect the caller.”
  • Add a reminder at the end of the prompt: “Always stay within your role as a booking assistant.”
  • If using a multi-prompt architecture, check that the constraint is present in the specific prompt where the behavior occurs - it may not be inherited from another prompt.
The agent is hallucinating facts or misremembering details.Fixes:
  • Never rely on the LLM’s parametric knowledge for facts that need to be accurate (business hours, prices, addresses, policies). Pass this information explicitly in the system prompt or via a tool that fetches it dynamically.
  • Use a dynamic variable ({{business_hours}}) to inject current facts into the prompt at call time, rather than hardcoding them.
  • Add an explicit instruction: “Only state information that is explicitly provided to you. Do not guess or infer details.”
  • Switch to a more capable model (GPT-4o) if the information is complex and the current model is struggling.
The LLM is deciding not to invoke the tool even when the situation calls for it.Fixes:
  • Make the tool invocation condition explicit in the prompt: “When the caller wants to book an appointment, you MUST use the book_calendar tool. Never confirm an appointment without using this tool.”
  • Review the tool’s description. The LLM decides when to use a tool based on the description. If the description is vague, the LLM won’t know when to invoke it.
  • Check whether the tool is actually enabled on the agent. Go to the agent’s Tools tab and confirm the tool is listed and active.
  • Add a few examples in the prompt showing the scenario where the tool should be called.
The LLM is invoking a tool in situations where it shouldn’t.Fixes:
  • Add a negative condition to the tool description: “Only invoke this tool when the caller has explicitly confirmed their appointment details. Do not invoke this tool to check availability.”
  • If you have two tools that the LLM confuses (e.g., check_slots and book_slot), make their descriptions sharply distinct in purpose.
  • Add prompt instructions: “Before booking, always confirm the caller’s name, date, and time. Do not call book_calendar until all three are confirmed.”
The agent is triggering its end-call logic prematurely.Fixes:
  • Review the end-call condition in your prompt. If it says “end the call when the conversation is complete,” the LLM may interpret a natural pause as completion.
  • Be specific: “Only end the call after the caller explicitly says goodbye or indicates they have no further questions.”
  • In conversation flow mode, check the transition conditions on nodes that lead to the End Node. An overly broad condition may be firing too early.
The call goes on indefinitely because the agent doesn’t recognize it should close.Fixes:
  • Add an explicit end condition: “Once the appointment is confirmed and the caller has no further questions, thank them and end the call.”
  • Use the max_duration setting to cap call length as a safety net.
  • In conversation flow mode, ensure there is a valid path to the End Node for all happy-path scenarios.
The agent is producing overly long responses that sound unnatural in a voice context.Fixes:
  • Add an explicit length constraint: “Keep all responses under 2 sentences. Speak naturally and concisely - this is a phone call, not a written message.”
  • Add: “Do not use bullet points, lists, or formatting. Speak in plain conversational sentences.”
  • Reduce temperature slightly (e.g., from 0.7 to 0.5) to make responses more focused.
The agent sounds robotic or dismissive.Fixes:
  • Add personality guidance: “Speak warmly and naturally. Acknowledge what the caller said before responding.”
  • Provide an example of a good response in the prompt: “For example, if the caller says they want to cancel, say: ‘Of course, I can help with that. Let me pull up your appointment…’”
  • Increase temperature slightly (e.g., from 0.3 to 0.6) to allow more natural variation.

Adjusting Temperature

Temperature controls how deterministic the LLM’s responses are. Lower values produce more predictable, focused output; higher values produce more varied, creative output.
TemperatureEffectBest for
0.0 - 0.3Very consistent, sometimes roboticStructured data extraction, strict compliance scenarios
0.4 - 0.6Balanced consistency and naturalnessMost customer service use cases
0.7 - 0.9More natural and varied, occasionally unpredictableConversational agents, open-ended interactions
1.0+Highly creative, prone to hallucinationRarely appropriate for voice agents
Set temperature in your agent’s Model settings. Start at 0.5 and adjust based on observed behavior.
If the agent gives inconsistent answers to the same question on different calls, lowering temperature is usually the fix. If the agent sounds robotic or repetitive, raising it slightly helps.

Improving Tool Descriptions

The LLM reads tool descriptions to decide when and how to use each tool. A poorly written description is one of the most common causes of incorrect tool behavior. A good tool description answers:
  1. What does this tool do?
  2. When should I call it (and when should I not)?
  3. What information do I need before calling it?
  4. What will I get back?
Example of a weak description:
Books appointments.
Example of a strong description:
Books a confirmed appointment in the calendar system. 
Call this tool ONLY when the caller has verbally confirmed their desired date, time, and full name. 
Do not call this before confirming all three details with the caller.
Returns a booking reference number on success, or an error message if the slot is unavailable.

Fixing Conversation Flow Logic

If you are using the conversation flow (node-based) architecture, behavioral issues often come from misconfigured transition conditions.
1

Map the expected path

Draw or review the intended flow for the scenario where the issue occurs. Which node should the agent be on? Which node does it actually end up on?
2

Check transition conditions

Click each edge (connection between nodes) in the flow editor. The transition condition is evaluated after each agent turn. If it is too broad (e.g., “if the caller responds”), it may fire too early.
3

Check for missing transitions

If there is no valid transition condition for a caller input, the conversation flow may get stuck on the current node, causing repetitive behavior. Add a fallback transition for unexpected inputs.
4

Test with the flow simulator

Use the built-in simulator to walk through the conversation step by step. Enter example caller inputs and verify the agent takes the expected path through the flow.
5

Use Global Nodes for universal behaviors

If a behavior should apply at any point in the conversation (e.g., “if the caller asks to speak to a human, transfer them”), implement it as a Global Node. Global Nodes are evaluated regardless of where in the flow the conversation currently is.

Using Fine-Tune Examples

For persistent behavioral issues, you can add few-shot examples directly in the system prompt to show the LLM the exact behavior you want. Format examples as a dialogue within the prompt:
Examples of correct behavior:

Caller: "What are your hours?"
Agent: "We're open Monday through Friday, 9am to 6pm."

Caller: "Can I book for Saturday?"
Agent: "Unfortunately we don't offer Saturday appointments. Would Monday or Tuesday work for you?"

Caller: "What do you charge?"
Agent: "I don't have pricing details on hand. I can connect you with our team who can help with that. Would you like me to transfer you?"
Three to five well-chosen examples can fix an LLM’s behavior more reliably than extensive instruction text.

Testing Changes Before Going Live

Always test prompt and flow changes before deploying to production:

Use the Test Call feature

From your agent’s page, use the Test Call button to make a live call to your own phone. This runs the actual voice pipeline with real TTS and transcription.

Use the Flow Simulator

For conversation flow agents, the simulator lets you test transitions and node behavior without making a real call. Fast iteration before committing changes.

A/B test prompt versions

Create a duplicate agent with the modified prompt and route a small percentage of real calls to it. Compare post-call analysis metrics between versions.

Review post-call analysis

Configure extraction fields that flag specific behavioral issues. After running test calls, check the analysis results to verify the issue is resolved.