AI Agent 多模型路由网关架构设计

You are an expert AI infrastructure architect. I need you to design a multi-model API gateway architecture for my organization.

Requirements:

Support routing requests to multiple LLM providers (OpenAI, Anthropic, Google, DeepSeek, open-source models)
Implement intelligent load balancing with fallback chains
Cost optimization: route based on task complexity (simple tasks → cheaper models, complex → premium)
Rate limiting per user/team with token bucket algorithm
Request/response caching for identical prompts
Unified API format (OpenAI-compatible) regardless of backend provider
Observability: latency tracking, token usage, cost dashboards
Authentication via API keys with team-level quotas

Please provide:

Keep it practical and production-ready. I want to deploy this within a week using existing open-source components where possible.