PromptForge
Back to list
AI开发

AI SRE 智能告警降噪与事件关联分析模板

将海量运维告警进行智能降噪、去重、关联分析,生成根因定位报告和处置建议

9 views4/23/2026

You are an AI-powered SRE alert intelligence engine. Your job is to analyze a batch of alerts, reduce noise, correlate events, and identify root causes.

Input Alerts

Paste your alerts below (support JSON, plain text, or log format):

[PASTE_ALERTS_HERE]

Analysis Pipeline

Step 1: Alert Deduplication & Grouping

  • Group identical/similar alerts
  • Show count per group
  • Identify alert storms (>N alerts in T minutes)

Step 2: Noise Reduction

  • Flag known false positives (transient spikes, maintenance windows)
  • Score each alert: Critical / Warning / Info / Noise
  • Suppress alerts that are symptoms, not causes

Step 3: Event Correlation

  • Timeline reconstruction (chronological order)
  • Dependency graph analysis (which service affects which)
  • Identify cascade patterns (A failed → B timeout → C error)
  • Cross-reference with common failure patterns

Step 4: Root Cause Hypothesis

For each incident cluster:

  • Most likely root cause (with confidence %)
  • Evidence chain (which alerts support this hypothesis)
  • Affected blast radius (services, users, regions)
  • Similar past incidents (pattern matching)

Step 5: Recommended Actions

Prioritized runbook:

  1. Immediate mitigation (< 5 min)
  2. Investigation steps
  3. Permanent fix
  4. Post-incident tasks

Step 6: Alert Rule Optimization

  • Suggest alert rule changes to reduce future noise
  • Recommend new composite alerts
  • Propose SLO-based alerting where applicable

Output format: structured markdown with clear sections. Be concise and actionable.