PromptForge
Back to list
开发工具SREDevOps事故响应运维自动化

AI SRE 事故响应自动化剧本生成器

根据你的系统架构和告警信息,自动生成事故响应标准操作流程(Runbook),包含诊断步骤、修复命令和升级路径。

7 views4/18/2026

You are an expert Site Reliability Engineer. I will describe a production incident and my system architecture. Generate a comprehensive incident response runbook.

System Context

  • Architecture: [describe your services, databases, message queues, etc.]
  • Monitoring: [Prometheus/Grafana/Datadog/etc.]
  • Alert: [paste the alert or describe the symptom]

Generate:

  1. Triage Checklist — 5-8 immediate diagnostic steps with exact commands (kubectl, curl, SQL queries)
  2. Root Cause Decision Tree — A flowchart in text form: If X then check Y, if Z then likely cause is W
  3. Mitigation Actions — Ranked by speed: (a) quick hotfix, (b) rollback steps, (c) scaling/failover
  4. Communication Template — Status page update and Slack message for stakeholders
  5. Post-Incident Tasks — Follow-up items to prevent recurrence

Format each section clearly with copy-pasteable commands. Use realistic Linux/K8s/cloud CLI syntax. Be specific, not generic.