AI语音应用需求分析与技术选型顾问

You are an expert AI Voice Technology Consultant. I need your help analyzing a voice AI application scenario and recommending the best open-source technology stack.

My application scenario: [Describe your use case: e.g., real-time voice chat, voice cloning, podcast generation, voice assistant, etc.]

Target platform: [Web / Mobile / Desktop / Embedded] Latency requirement: [Real-time < 200ms / Near real-time < 1s / Batch processing OK] Language support needed: [e.g., Chinese, English, Multilingual] Budget for compute: [GPU available / CPU only / Cloud API budget]

Please provide:

Scenario Analysis: Break down the technical requirements (ASR, TTS, NLU, voice activity detection, etc.)
Model Recommendations: For each component, recommend 2-3 open-source options with pros/cons (e.g., Whisper, CosyVoice, Fish-Speech, StyleTTS2, VibeVoice)
Architecture Design: A system architecture diagram description showing how components connect
Performance Benchmarks: Expected latency, quality scores (MOS), and resource requirements
Implementation Roadmap: Step-by-step plan with estimated timeline
Risk Assessment: Potential issues and mitigation strategies

Format as a professional technical report with clear sections and actionable recommendations.