Skip to main content
User can run simulation tests to automatically evaluate agent across multiple scenarios. This is comprehensive testing beyond manual web/phone calls.

What Are Simulation Tests?

Simulation tests allow user to:
  • Create test scenarios
  • Run automated conversations
  • Evaluate agent responses
  • Generate performance insights
  • Identify improvement areas
Simulation tests are ideal for regression testing and quality assurance before production deployment.

Evaluation Suite

agent edit screen - evaluation suite pannel User can access evaluation suite from agent edit screen. Evaluation suite provides:
  • Scenario management
  • Test run execution
  • Results analysis
  • Performance insights

Creating Test Scenarios

User can create test scenarios to evaluate agent: agent edit screen - evaluation suite - create or generate scenario

Create Scenario Manually

User can create scenario by defining: Scenario Information:
  • Scenario name
  • Description
  • Expected outcome
  • Success criteria
Test Conversation:
  • User messages (what caller says)
  • Expected agent responses
  • Required actions (tools to use)
  • Conversation flow
Example Scenario:
Scenario: Appointment Booking - Happy Path

User: "Hi, I'd like to book an appointment"
Expected: Agent asks for preferred date

User: "Next Tuesday"
Expected: Agent checks availability, presents time options

User: "2 PM works for me"
Expected: Agent confirms booking, collects contact info

User: "john@email.com, 555-1234"
Expected: Agent confirms details, uses calendar_booking tool

Success Criteria:
- Appointment booked successfully
- Confirmation email sent
- Call ended politely

Generate Scenarios with AI

User can auto-generate scenarios:
  1. Click “Generate Scenario”
  2. Describe scenario type
  3. AI generates test conversation
  4. User reviews and edits
  5. Save scenario
Generate options:
  • Happy path scenarios
  • Error handling scenarios
  • Edge case scenarios
  • Compliance testing scenarios

Running Test Scenarios

User can execute test runs: agent edit screen - evaluation suite - create test run

Create Test Run

User can run tests:
  1. Select scenarios to test
  2. Configure test parameters
  3. Start test run
  4. Wait for completion
  5. Review results
Test Run Configuration:
  • Scenarios to include
  • Number of iterations
  • Test environment
  • Evaluation criteria

Viewing Test Results

User can analyze test results: agent edit screen - evaluation suite pannel - view test results

Test Results Include

Per Scenario:
  • Pass/Fail status
  • Response accuracy
  • Tool execution success
  • Conversation flow correctness
  • Response time metrics
Detailed Metrics:
  • Intent recognition accuracy
  • Entity extraction accuracy
  • Tool usage correctness
  • Response relevance
  • Conversation completion rate
Failure Analysis:
  • Where agent failed
  • Why it failed
  • Actual vs expected response
  • Suggested improvements

Generating Insights

User can generate AI-powered insights: agent edit screen - evaluation suite pannel - generate/view insights

Insights Provide

Performance Summary:
  • Overall pass rate
  • Common failure patterns
  • Response quality trends
  • Tool execution reliability
Improvement Recommendations:
  • Prompt adjustments
  • Tool configuration changes
  • Guardrail updates
  • Training data needs
Trend Analysis:
  • Performance over time
  • Regression detection
  • Quality improvements
  • Consistency metrics

Test Scenario Types

Happy Path Scenarios

User should test standard flows: Appointment Booking:
  • User requests appointment
  • Provides all information clearly
  • Accepts offered time slot
  • Completes booking
Information Request:
  • User asks clear question
  • Agent retrieves information
  • User satisfied with answer
  • Call ends politely
Product Inquiry:
  • User asks about product
  • Agent provides details
  • User has follow-up questions
  • Agent answers thoroughly

Error Handling Scenarios

User should test error conditions: Missing Information:
  • User doesn’t provide required data
  • Agent prompts for missing info
  • User eventually provides it
  • Task completes
Tool Failures:
  • Calendar API is down
  • Webhook times out
  • Email service fails
  • Agent handles gracefully
Unclear Input:
  • User mumbles or speaks unclearly
  • Background noise present
  • Agent asks for clarification
  • Conversation continues

Edge Case Scenarios

User should test unusual situations: Topic Changes:
  • User starts one topic
  • Switches to different topic
  • Switches back
  • Agent tracks context
Interruptions:
  • User interrupts agent
  • Agent stops and listens
  • Conversation continues naturally
Off-Topic Questions:
  • User asks unrelated questions
  • Agent uses guardrails
  • Redirects or transfers
  • Maintains professionalism

Compliance Scenarios

User should test compliance: Data Privacy:
  • Agent doesn’t share sensitive data
  • Verifies identity before sharing
  • Follows HIPAA/GDPR rules
Prohibited Topics:
  • Agent refuses medical advice
  • Declines legal advice
  • Redirects appropriately
Escalation:
  • Agent transfers when required
  • Explains transfer reason
  • Connects to right department

Best Practices

Scenario Design

User should:
  • Cover all conversation paths
  • Include success and failure cases
  • Test each tool thoroughly
  • Simulate real user behavior
  • Use actual customer language
User should avoid:
  • Only testing happy paths
  • Unrealistic scenarios
  • Missing edge cases
  • Not testing tools
  • Perfect input only

Test Coverage

User must test: Core Functions:
  • All primary tasks
  • All enabled tools
  • All conversation flows
  • All escalation paths
Quality Metrics:
  • Response accuracy
  • Response time
  • Natural conversation flow
  • Tool execution success
Compliance:
  • Guardrail effectiveness
  • Privacy protection
  • Prohibited topic handling
  • Escalation triggers

Regular Testing

User should run simulation tests: Before Deployment:
  • Initial configuration
  • After major changes
  • Before production release
Ongoing:
  • Weekly regression tests
  • After prompt updates
  • After tool configuration changes
  • When issues reported
Performance Monitoring:
  • Monthly comprehensive tests
  • Quarterly reviews
  • Annual audits

Interpreting Results

Pass/Fail Criteria

Pass Criteria:
  • Agent response matches expected
  • Tools executed correctly
  • Conversation completed successfully
  • Guardrails respected
Fail Criteria:
  • Wrong intent detected
  • Tool not executed when required
  • Prohibited topics discussed
  • Conversation incomplete

Common Failure Patterns

Intent Misrecognition:
  • User intent not understood
  • Wrong task initiated
  • Fix: Update identity/tasks section
Tool Execution Errors:
  • Tool not triggered
  • Tool triggered incorrectly
  • Fix: Update tool prompting
Guardrail Violations:
  • Prohibited topics discussed
  • Missing escalation
  • Fix: Strengthen guardrails section
Conversation Flow Issues:
  • Awkward transitions
  • Missing context
  • Fix: Improve task workflows

Iterating Based on Results

User can improve agent:
  1. Review failed scenarios
  2. Identify failure patterns
  3. Update configuration
  4. Re-run tests
  5. Verify improvements
Iteration Workflow:
Test → Analyze Results → Update Config → Re-test → Deploy

Integration with Development

User can integrate testing into workflow: Development Process:
  1. Make configuration change
  2. Run relevant simulation tests
  3. Review results
  4. Fix issues
  5. Re-test until passing
  6. Deploy to production
CI/CD Integration:
  • Automated test runs
  • Pre-deployment checks
  • Quality gates
  • Regression detection

Simulation vs Manual Testing

Manual Testing (Web/Phone):
  • Quick feedback
  • Real-time interaction
  • Subjective evaluation
  • Good for initial testing
Simulation Testing:
  • Comprehensive coverage
  • Automated execution
  • Objective metrics
  • Good for regression testing
  • Scalable testing
Use Both:
  • Manual for rapid iteration
  • Simulation for thorough validation
  • Both before production

Next Steps

After simulation testing:
  1. Address all failing scenarios
  2. Achieve target pass rate (90%+)
  3. Run final manual tests
  4. Deploy to production
  5. Monitor real performance
  6. Update scenarios based on real calls
Simulation tests complement manual testing. Use both for comprehensive quality assurance.
Start with 10-20 key scenarios covering main flows, then expand to edge cases and error conditions.