Latency & Performance
Master strategies to optimize response times and deliver fast, responsive AI agents
Your Progress
0 / 5 completedStreaming & Async Patterns
Streaming and async execution transform user experience even when actual latency stays the same. Streaming shows tokens as they generate—users see progress immediately instead of waiting for complete responses. Async patterns prevent blocking: while the LLM processes, your UI stays responsive and can handle other tasks.
Interactive: Streaming vs Batch Comparison
Experience the difference between batch (wait for full response) and streaming (tokens as they arrive):
Streaming Best Practices
- ✓Optimize TTFT: Time to first token is critical—target <300ms
- ✓Stream UI updates: Show tokens as they arrive, not in large chunks
- ✓Handle errors gracefully: Stream can fail mid-response—show partial content
- ✓Add visual feedback: Cursor animation, "typing..." indicator
Async Execution Patterns
Use async/await for I/O operations. Don't block the main thread waiting for LLM responses.
Execute independent LLM calls concurrently with Promise.all(). 3 sequential 500ms calls = 1500ms. 3 parallel calls = 500ms.
Offload non-critical tasks (logging, analytics, embeddings) to queues or background workers. Don't make users wait.
Show cached/fast results immediately, then enhance with LLM results when ready. Example: show rule-based response, then improve with LLM reasoning.
Users perceive streaming as 40-60% faster than batch even with identical total latency. Show progress, keep UI responsive, provide immediate feedback. A 1-second response that streams feels faster than 800ms batch response with no feedback. Optimize for user experience, not just raw milliseconds.