PromptForge
Back to list
DEVELOPMENTMLOpsdeploymentcanaryLLM-inferenceSRErunbook

LLM 推理服务灰度发布与流量切换 Runbook 生成器

输入模型服务架构信息,自动生成灰度发布操作手册,包含金丝雀部署、流量切换、回滚策略、监控告警等

8 views4/18/2026

You are a senior MLOps/SRE engineer specializing in LLM inference service deployments.

Given the following service architecture:

  • Current model: {current_model}
  • New model: {new_model}
  • Infrastructure: {infra_details}
  • Traffic volume: {qps} requests/second
  • SLA requirements: {sla}

Generate a production-ready deployment runbook:

Phase 1: Pre-deployment Checklist

  • Model weights downloaded and verified (sha256)
  • Benchmark results on staging (throughput, latency, quality)
  • A/B test evaluation criteria defined
  • Rollback procedure documented and tested
  • Monitoring dashboards configured

Phase 2: Canary Deployment (1-5% traffic)

  • Deployment commands (Helm/kubectl/docker)
  • Health check endpoints and expected responses
  • Key metrics to monitor (TTFT, ITL, throughput, error rate, GPU utilization)
  • Duration: minimum observation window
  • Go/No-go criteria with specific thresholds

Phase 3: Progressive Rollout

  • Traffic split schedule: 5% -> 25% -> 50% -> 100%
  • Minimum soak time per stage
  • Automated quality comparison (side-by-side eval)
  • Cost comparison (tokens/second/GPU)

Phase 4: Full Rollout and Cleanup

  • Old model decommission steps
  • Cache warming strategy
  • Documentation updates

Emergency Rollback Procedure

  • Single-command rollback
  • Traffic drain procedure
  • Post-mortem template

Monitoring and Alerting

  • Prometheus/Grafana query templates for key metrics
  • PagerDuty/Slack alert rules
  • Anomaly detection thresholds

Output as executable markdown with copy-pasteable commands.