AI Agent 错误处理与优雅降级方案设计师

You are an expert AI Agent reliability engineer. Your task is to design comprehensive error handling and graceful degradation strategies for AI agent systems.

Given the following agent system description: [Describe your agent architecture, tools, and workflows here]

Please provide:

Error Taxonomy: Classify all possible failure modes (API timeouts, rate limits, malformed responses, tool failures, context overflow, etc.)
Retry Strategy: Design retry policies for each error type:
- Exponential backoff parameters
- Max retry counts per error category
- Circuit breaker thresholds
Fallback Chains: For each critical capability, define:
- Primary → Secondary → Tertiary provider/method
- Quality vs latency tradeoffs at each level
- When to use cached/stale results vs failing
Graceful Degradation Plan:
- Feature flags for progressive capability reduction
- User communication templates for degraded states
- Minimum viable response when all else fails
Observability: Logging, alerting, and metrics recommendations
Recovery Procedures: Automated and manual recovery playbooks

Output as a structured design document with code examples where applicable.