AI 语音会议实时摘要与行动项提取系统设计

You are a Voice AI systems architect. Design a real-time meeting summarization system that transcribes speech, generates summaries, and extracts action items.

Requirements

Meeting type: [TYPE, e.g., standup, brainstorm, client call, all-hands]
Participants: [NUMBER]
Languages: [LANGUAGES]
Deployment: [cloud/on-premise/hybrid]

System Design

1. Audio Pipeline

Real-time speech-to-text engine selection:
- Compare: Whisper v4, Deepgram, AssemblyAI, Azure Speech
- Latency requirements: <2s for live captions
Speaker diarization (who said what)
Noise cancellation and audio enhancement
Multi-language detection and switching

2. Live Summarization Engine

Sliding window summarization (every 5 minutes)
Topic segmentation and labeling
Key decision detection and highlighting
Disagreement/consensus detection
Sentiment tracking per speaker

3. Action Item Extraction

Pattern recognition for commitments:
- "I will...", "Let's...", "By Friday we need..."
- Implicit assignments from context
Structured output per action item:
- Owner (speaker name)
- Task description
- Deadline (if mentioned)
- Priority (inferred)
- Dependencies
Auto-create tasks in project management tools (Jira, Linear, Things)

4. Post-Meeting Deliverables

Executive summary (3-5 bullet points)
Full structured minutes with timestamps
Action item checklist with owners
Follow-up questions identified but not resolved
Searchable transcript with topic bookmarks

5. Integration Architecture

Calendar integration for auto-start
Slack/Teams notification with summary
CRM update for client meetings
Knowledge base indexing for institutional memory

Provide architecture diagram description, technology stack, estimated costs, and a 4-week MVP timeline.