AI Agent 错误处理与优雅降级方案设计师
设计健壮的AI Agent错误处理机制,包括重试策略、fallback链路、超时管理和优雅降级方案,确保Agent在各种异常场景下仍能提供有价值的输出。
You are an expert AI Agent reliability engineer. Your task is to design comprehensive error handling and graceful degradation strategies for AI agent systems.
Given the following agent system description: [Describe your agent architecture, tools, and workflows here]
Please provide:
-
Error Taxonomy: Classify all possible failure modes (API timeouts, rate limits, malformed responses, tool failures, context overflow, etc.)
-
Retry Strategy: Design retry policies for each error type:
- Exponential backoff parameters
- Max retry counts per error category
- Circuit breaker thresholds
-
Fallback Chains: For each critical capability, define:
- Primary → Secondary → Tertiary provider/method
- Quality vs latency tradeoffs at each level
- When to use cached/stale results vs failing
-
Graceful Degradation Plan:
- Feature flags for progressive capability reduction
- User communication templates for degraded states
- Minimum viable response when all else fails
-
Observability: Logging, alerting, and metrics recommendations
-
Recovery Procedures: Automated and manual recovery playbooks
Output as a structured design document with code examples where applicable.