User Experience Metrics

Master UX metrics to measure and optimize AI agent performance from the user perspective

Response Quality Metrics

Quality metrics assess whether responses are accurate, relevant, complete, clear, and helpful. While technical metrics measure model performance, quality metrics capture user perception of value. A technically accurate response that users find confusing or incomplete scores low on quality.

Interactive: Quality Score Calculator

Adjust quality dimensions to see how they affect overall response quality. Different use cases prioritize different dimensions:

Overall Quality Score
Weighted average across all dimensions
85%
Accuracy
Weight: 30%
85%
Relevance
Weight: 25%
78%
Completeness
Weight: 20%
92%
Clarity
Weight: 15%
88%
Helpfulness
Weight: 10%
81%
Insight: Excellent quality! Responses are accurate, relevant, and helpful. Users are highly satisfied.

Quality Dimensions Explained

  • Accuracy: Factually correct information. No hallucinations or errors. Verify with ground truth.
  • Relevance: Directly addresses user query. No tangents or off-topic content. Context-appropriate.
  • Completeness: Provides all necessary information. No critical gaps. Users don't need follow-ups.
  • Clarity: Easy to understand. No jargon or confusing explanations. Well-structured.
  • Helpfulness: Actually solves user problem. Actionable guidance. Empathetic tone.

How to Measure Quality

1. Human evaluation

Sample 100-200 responses weekly. Raters score each dimension 1-5. Expensive but accurate.

2. LLM-as-judge

Use GPT-4 to evaluate responses on quality dimensions. Scale to 100% coverage. Correlate with human ratings.

3. User feedback correlation

Track which quality dimensions correlate with thumbs up/down. Optimize high-impact dimensions first.

💡
Quality Drives Satisfaction

Quality and satisfaction are highly correlated (r = 0.8-0.9). Improving quality from 75% to 85% typically increases satisfaction by 10-15 points. Focus on dimensions users care most about— for support agents, helpfulness and clarity matter more than completeness. For research agents, accuracy and completeness are critical.

Satisfaction Tracking