Agentic Design

Patterns
๐Ÿ”ง

Error Handling and Recovery Patterns(ERP)

Comprehensive error communication and recovery interface patterns for graceful failure handling in agent systems

Complexity: mediumUI/UX & Human-AI Interaction

๐ŸŽฏ 30-Second Overview

Pattern: Comprehensive error communication and recovery interfaces for graceful failure handling

Why: Agent system failures need clear communication and effective recovery paths to maintain user trust

Key Insight: Use three-element structure: Problem + Cause + Solution with context preservation and progressive disclosure

โšก Quick Implementation

1Error Classification:Categorize errors by severity, recoverability, and user impact
2Display Strategy:Choose appropriate modality: inline, tooltip, modal, or banner
3Message Structure:Problem + Cause + Solution with progressive disclosure
4Recovery Mechanisms:Automatic retry, context preservation, and manual recovery
5Prevention Patterns:Input validation, confirmation dialogs, and undo capabilities
Example: agent_failure โ†’ classify_error โ†’ context_preserved โ†’ recovery_options โ†’ user_choice

๐Ÿ“‹ Do's & Don'ts

โœ…Use clear, jargon-free language for error messages
โœ…Provide specific actionable solutions, not generic advice
โœ…Preserve user context and work during error recovery
โœ…Implement progressive disclosure for technical details
โœ…Use consistent visual design for error states
โŒBlame users or use negative language in error messages
โŒShow technical stack traces or error codes to end users
โŒUse error dialogs for non-critical validation messages
โŒLeave users stranded without clear next steps
โŒHide errors or fail silently without user notification

๐Ÿšฆ When to Use

Use When

  • โ€ข Production systems with high user dependency
  • โ€ข Critical workflows requiring reliability
  • โ€ข Complex agent operations prone to failure
  • โ€ข Systems handling sensitive or valuable data

Avoid When

  • โ€ข Simple prototype or demonstration systems
  • โ€ข Internal tools with technical users only
  • โ€ข Low-stakes experimental applications
  • โ€ข Systems with 100% reliable operations

๐Ÿ“Š Key Metrics

Error Recovery Rate
% of errors successfully resolved automatically
User Error Resolution
Time from error to successful user resolution
Context Preservation
% of user work preserved during error recovery
Error Comprehension
User understanding of error cause and solution
Prevention Effectiveness
Reduction in preventable errors over time
Support Ticket Reduction
Decrease in error-related support requests

๐Ÿ’ก Top Use Cases

Agent Communication Failures: Network timeouts with automatic retry and manual override
Input Validation Errors: Real-time feedback with correction suggestions
System Resource Limitations: Graceful degradation with alternative options
Permission/Authentication Issues: Clear explanations with resolution paths
Data Processing Errors: Context preservation with partial result recovery

References & Further Reading

Deepen your understanding with these curated resources

Contribute to this collection

Know a great resource? Submit a pull request to add it.

Contribute

Patterns

closed

Loading...