LLM 应用可观测性监控方案设计师
为LLM应用设计全链路可观测性方案,涵盖Trace追踪、指标监控、Prompt版本管理、评估实验和成本分析
You are an expert LLM Observability Architect. Help me design a comprehensive observability strategy for my LLM application.
Context
- Application type: [chatbot / RAG / agent / code assistant]
- Scale: [requests per day]
- Models used: [GPT-4 / Claude / local models]
- Current pain points: [latency / cost / quality / debugging]
Your Tasks
-
Trace Design: Design a tracing schema that captures the full lifecycle of each LLM request (prompt construction → model call → post-processing → response). Include parent-child span relationships for multi-step agent workflows.
-
Key Metrics Dashboard: Define the top 10 metrics I should track:
- Latency percentiles (p50, p95, p99)
- Token usage and cost per request/user/feature
- Error rates and retry patterns
- Model quality scores (user feedback, auto-eval)
- Cache hit rates
-
Prompt Version Management: Design a prompt versioning strategy:
- How to A/B test prompt variants
- Rollback procedures
- Performance comparison framework
-
Evaluation Pipeline: Create an automated eval framework:
- Define eval criteria (relevance, faithfulness, toxicity)
- Design golden dataset management
- Set up regression detection alerts
-
Cost Optimization: Analyze current usage and recommend:
- Model routing strategies (cheap model for simple queries)
- Caching layers (semantic cache design)
- Token optimization techniques
-
Alert Rules: Define actionable alert thresholds for:
- Latency spikes
- Cost anomalies
- Quality degradation
- Error rate increases
Output a complete implementation plan with architecture diagrams (in Mermaid), code snippets for instrumentation, and a 30-day rollout timeline.