Choosing the Right Metrics

Not all metrics are created equal. The best monitoring systems focus on a small set of actionable metrics that directly inform decisions and drive improvements.

The RED Method for Services

Rate: Requests per second / workflows per minute

Errors: Number or percentage of failed requests

Duration: Time to process requests (latency distribution)

These three metrics provide a solid foundation for monitoring any request-driven system.

Four Categories of Workflow Metrics

Organize your metrics into categories to ensure comprehensive coverage without overwhelming your team:

⚡

Performance

Speed, responsiveness, throughput—how fast things happen

✓

Reliability

Success rates, errors, retries—how often things work correctly

📊

Business

Cost, completion, satisfaction—business value delivered

🔧

Resource

CPU, memory, tokens—infrastructure and cost optimization

Interactive: Metric Explorer

Explore 12 essential workflow metrics organized by category. Select a category to see relevant metrics:

Latency (P50, P95, P99)

Performance

Time to complete workflow operations at different percentiles

Example:

P95 = 850ms means 95% of requests complete within 850ms

💡

Why it matters: Percentiles reveal tail latency issues that averages hide

Throughput

Performance

Number of workflows processed per unit time

Example:

1,200 workflows/hour or 20 workflows/minute

💡

Why it matters: Measures system capacity and helps with scaling decisions

Time to First Token

Performance

Latency until first LLM response token arrives

Example:

TTFT = 420ms for GPT-4 call

💡

Why it matters: Impacts perceived responsiveness in streaming scenarios

💡

Best Practice

Start with 5-7 core metrics that cover the RED method plus your critical business outcomes. You can always add more later, but too many metrics early on leads to alert fatigue and unclear priorities.

Workflow Monitoring

Your Progress

Choosing the Right Metrics

The RED Method for Services

Four Categories of Workflow Metrics

Interactive: Metric Explorer

Latency (P50, P95, P99)

Throughput

Time to First Token