Simulation Tests

User can run simulation tests to automatically evaluate agent across multiple scenarios. This is comprehensive testing beyond manual web/phone calls.

What Are Simulation Tests?

Simulation tests allow user to:

Create test scenarios
Run automated conversations
Evaluate agent responses
Generate performance insights
Identify improvement areas

Simulation tests are ideal for regression testing and quality assurance before production deployment.

Evaluation Suite

User can access evaluation suite from agent edit screen. Evaluation suite provides:

Scenario management
Test run execution
Results analysis
Performance insights

Creating Test Scenarios

User can create test scenarios to evaluate agent: agent edit screen - evaluation suite - create or generate scenario

Create Scenario Manually

User can create scenario by defining: Scenario Information:

Scenario name
Description
Expected outcome
Success criteria

Test Conversation:

User messages (what caller says)
Expected agent responses
Required actions (tools to use)
Conversation flow

Example Scenario:

Scenario: Appointment Booking - Happy Path

User: "Hi, I'd like to book an appointment"
Expected: Agent asks for preferred date

User: "Next Tuesday"
Expected: Agent checks availability, presents time options

User: "2 PM works for me"
Expected: Agent confirms booking, collects contact info

User: "[email protected], 555-1234"
Expected: Agent confirms details, uses calendar_booking tool

Success Criteria:
- Appointment booked successfully
- Confirmation email sent
- Call ended politely

Generate Scenarios with AI

User can auto-generate scenarios:

Click “Generate Scenario”
Describe scenario type
AI generates test conversation
User reviews and edits
Save scenario

Generate options:

Happy path scenarios
Error handling scenarios
Edge case scenarios
Compliance testing scenarios

Running Test Scenarios

User can execute test runs: agent edit screen - evaluation suite - create test run

Create Test Run

User can run tests:

Select scenarios to test
Configure test parameters
Start test run
Wait for completion
Review results

Test Run Configuration:

Scenarios to include
Number of iterations
Test environment
Evaluation criteria

Viewing Test Results

User can analyze test results: agent edit screen - evaluation suite pannel - view test results

Test Results Include

Per Scenario:

Pass/Fail status
Response accuracy
Tool execution success
Conversation flow correctness
Response time metrics

Detailed Metrics:

Intent recognition accuracy
Entity extraction accuracy
Tool usage correctness
Response relevance
Conversation completion rate

Failure Analysis:

Where agent failed
Why it failed
Actual vs expected response
Suggested improvements

Generating Insights

User can generate AI-powered insights: agent edit screen - evaluation suite pannel - generate/view insights

Insights Provide

Performance Summary:

Overall pass rate
Common failure patterns
Response quality trends
Tool execution reliability

Improvement Recommendations:

Prompt adjustments
Tool configuration changes
Guardrail updates
Training data needs

Trend Analysis:

Performance over time
Regression detection
Quality improvements
Consistency metrics

Test Scenario Types

Happy Path Scenarios

User should test standard flows: Appointment Booking:

User requests appointment
Provides all information clearly
Accepts offered time slot
Completes booking

Information Request:

User asks clear question
Agent retrieves information
User satisfied with answer
Call ends politely

Product Inquiry:

User asks about product
Agent provides details
User has follow-up questions
Agent answers thoroughly

Error Handling Scenarios

User should test error conditions: Missing Information:

User doesn’t provide required data
Agent prompts for missing info
User eventually provides it
Task completes

Tool Failures:

Calendar API is down
Webhook times out
Email service fails
Agent handles gracefully

Unclear Input:

User mumbles or speaks unclearly
Background noise present
Agent asks for clarification
Conversation continues

Edge Case Scenarios

User should test unusual situations: Topic Changes:

User starts one topic
Switches to different topic
Switches back
Agent tracks context

Interruptions:

User interrupts agent
Agent stops and listens
Conversation continues naturally

Off-Topic Questions:

User asks unrelated questions
Agent uses guardrails
Redirects or transfers
Maintains professionalism

Compliance Scenarios

User should test compliance: Data Privacy:

Agent doesn’t share sensitive data
Verifies identity before sharing
Follows HIPAA/GDPR rules

Prohibited Topics:

Agent refuses medical advice
Declines legal advice
Redirects appropriately

Escalation:

Agent transfers when required
Explains transfer reason
Connects to right department

Best Practices

Scenario Design

User should:

Cover all conversation paths
Include success and failure cases
Test each tool thoroughly
Simulate real user behavior
Use actual customer language

User should avoid:

Only testing happy paths
Unrealistic scenarios
Missing edge cases
Not testing tools
Perfect input only

Test Coverage

User must test: Core Functions:

All primary tasks
All enabled tools
All conversation flows
All escalation paths

Quality Metrics:

Response accuracy
Response time
Natural conversation flow
Tool execution success

Compliance:

Guardrail effectiveness
Privacy protection
Prohibited topic handling
Escalation triggers

Regular Testing

User should run simulation tests: Before Deployment:

Initial configuration
After major changes
Before production release

Ongoing:

Weekly regression tests
After prompt updates
After tool configuration changes
When issues reported

Performance Monitoring:

Monthly comprehensive tests
Quarterly reviews
Annual audits

Interpreting Results

Pass/Fail Criteria

Pass Criteria:

Agent response matches expected
Tools executed correctly
Conversation completed successfully
Guardrails respected

Fail Criteria:

Wrong intent detected
Tool not executed when required
Prohibited topics discussed
Conversation incomplete

Common Failure Patterns

Intent Misrecognition:

User intent not understood
Wrong task initiated
Fix: Update identity/tasks section

Tool Execution Errors:

Tool not triggered
Tool triggered incorrectly
Fix: Update tool prompting

Guardrail Violations:

Prohibited topics discussed
Missing escalation
Fix: Strengthen guardrails section

Conversation Flow Issues:

Awkward transitions
Missing context
Fix: Improve task workflows

Iterating Based on Results

User can improve agent:

Review failed scenarios
Identify failure patterns
Update configuration
Re-run tests
Verify improvements

Iteration Workflow:

Test → Analyze Results → Update Config → Re-test → Deploy

Integration with Development

User can integrate testing into workflow: Development Process:

Make configuration change
Run relevant simulation tests
Review results
Fix issues
Re-test until passing
Deploy to production

CI/CD Integration:

Automated test runs
Pre-deployment checks
Quality gates
Regression detection

Simulation vs Manual Testing

Manual Testing (Web/Phone):

Quick feedback
Real-time interaction
Subjective evaluation
Good for initial testing

Simulation Testing:

Comprehensive coverage
Automated execution
Objective metrics
Good for regression testing
Scalable testing

Use Both:

Manual for rapid iteration
Simulation for thorough validation
Both before production

Next Steps

After simulation testing:

Address all failing scenarios
Achieve target pass rate (90%+)
Run final manual tests
Deploy to production
Monitor real performance
Update scenarios based on real calls

Simulation tests complement manual testing. Use both for comprehensive quality assurance.

Start with 10-20 key scenarios covering main flows, then expand to edge cases and error conditions.

Getting Started

Build

Test

Deploy

Billing and Support

Legal

What Are Simulation Tests?

Evaluation Suite