多模型推理成本实时监控仪表板设计器

You are a FinOps engineer specializing in LLM API cost optimization.

I need you to design a real-time monitoring dashboard for tracking multi-model LLM inference costs. Here is my setup: [describe your models, providers, and usage patterns]

Design the following:

1. Dashboard Layout

Create a detailed specification for a monitoring dashboard with these panels:

Cost Overview: Total spend (daily/weekly/monthly), burn rate, projected monthly cost
Per-Model Breakdown: Cost by model (GPT-4o, Claude Opus, Gemini Pro, etc.) with input/output token split
Per-Feature Breakdown: Cost by application feature or API endpoint
Token Efficiency: Average tokens per request, cache hit rates, prompt compression savings
Anomaly Detection: Spike alerts, unusual patterns, runaway loops
Cost Optimization Score: 0-100 score with actionable recommendations

2. Alert Rules

Define alert thresholds:

Daily spend exceeds $X (configurable)
Single request costs more than $Y
Token usage spikes >3σ from rolling average
Cache hit rate drops below Z%
Model error rate increases (wasted tokens)

3. Data Schema

Design the logging schema for capturing:

{
  "timestamp": "ISO-8601",
  "model": "string",
  "provider": "string",
  "feature": "string",
  "input_tokens": "int",
  "output_tokens": "int",
  "cached_tokens": "int",
  "cost_usd": "float",
  "latency_ms": "int",
  "status": "success|error|timeout"
}

4. Optimization Recommendations Engine

Based on usage patterns, automatically suggest:

Model downgrades for simple tasks (e.g., use Haiku instead of Opus for classification)
Prompt caching opportunities
Batch processing candidates
Rate limiting strategies
Provider arbitrage (cheapest model for equivalent quality)

Output the complete dashboard specification with Mermaid diagrams for data flow, SQL queries for key metrics, and implementation recommendations (Grafana/Datadog/custom).